1 Introduction

AAdisdtirsibtruitbedutveedrsivoenrosfiothneoGf atnhteer Galagnortiethrmalfgoorrgietnhemral for genGeraal olisGLaalottiisceLsattices

Ger´ardGLe.v´Lyéavnyd

FFa. tBmaakBloauktliouti

0 0 CECREIRAIALaLbaboorarattoorryy,, PPaarriissDDaauupphihnienUenUivneivrseirtysity PlacPeladcue MduaMerc ́ahraélchdael LdeatLtartetrdeedeTTaasssisgignnyy , 7755777755 PPaarrisiscecdeedxex161,6F,raFnrcaence

207 221

Standard Galois Lattices are effective tools for data analysis and knowledge discovery. Several algorithms were proposed to generate concepts of lattices, among which the ScalingNextClosure algorithm. In order to share the production workload between several processors when the number of closed itemsets to determine is very large, this algorithm leans on the sequential character of the closed itemsets determination of a Galois Lattice by the Ganter algorithm. In this paper, we prove that the parallelised version of the ScalingNextClosure can be extended to more general contexts (even some complex data) than usual binary contexts and that the partition of the workload between processors can be made with all the wished precision.

1 Introduction

Standard Galois Lattices are very useful tools in data mining. They allow us to structure data sets, by extracting concepts and rules to deduce concepts from other concepts. Concept lattice studies how objects can be hierarchically grouped together according to their common attributes. Since each set of objects possesses some attributes, we can classify objects and attributes according to the relation between objects set and attributes set. Formal concept or closed itemset represents stronger association between itemsets and the set of their common objects.

Several algorithms are well known and used to determine the Galois lattice GL(C) associated to a given context C, when the size of C is not too large. Nevertheless, we need nowadays to treat contexts which are large and not necessarily binary. Several algorithms were proposed to generate concepts of lattices, among which the ScalingNextClosure algorithm proposed in [ 11 ]. In order to share the workload between several processors when the number of closed itemsets to determine is too large, this algorithm leans on the sequential character of the closed itemsets determination of a Galois Lattice by the Ganter algorithm. One of the essential points of this distributed version is the work distribution between various processors. (The qualifier "large concerns here not only the size of the context but also the number of closed itemsets of the lattice since a small table can generate a big lattice). In this paper, we show that the parallelized version of the ScalingNextClosure can be extended to more general contexts, than usual binary contexts and that the partition of the work between processors can be made with all the required precision. The rest of the paper is organized as follows. In section 2, we define all the concepts necessary to present clearly the general contexts to which we shall apply our algorithms, and to develop the tools of partition. In section 3, we define the lexicographical order. We introduce the first version of the Ganter algorithm in section 4 and its segmented version in section 5. Section 6 is devoted to the choice of partitions. In section 7, we define some mathematical tools in order to help the construction of good partitions proposed in section 8. Section 9 concludes this paper by listing future research directions.

2 Notations, Definitions

Let m and n be two finite positive integers. The set I = {1 m} = [1 m] designates a set of m individuals or objects i, and the set J = {1 n} = [1 n] designates a set of n properties or variables j.

Every variable j takes its values in a set Bj = {0 bj - 1} = [0 bj - 1], where bj is a finite integer > 1.

In that case, the sequence (b1 bn) will be called a multibased or a generalized base, or a heterogeneous base.

We suppose that every Bj is provided with the total order of the natural integers. We note B[n] the Cartesian product B1 x B2 x x Bn which is the set of the elements Xn = (x1 xn) such as xj j, for every j of J.

We provide B[n] with the relation of order of the Cartesian product, namely: Xn Yn iff xj yj for every j of J. Provided with this relation < B [n], > is a lattice for the operations sup (supremum) noted and inf (infimum) noted These operations are defined by: Zn = Xn Yn iff for every j of J zj = xj yj = sup (xj, yj); Zn = Xn Yn iff for every j of J zj = xj yj = inf (xj, yj).

Furthermore, the extreme elements of this lattice are OF and 1F. OF and 1F are defined by OF = (0 0 0), and 1F = (b1- 1 bj - 1 bn- 1).

Every Xn of B [n] verifies the constraint 0F Xn 1F. (In the rest of this paper, in order to avoid using bj - 1, we shall put 1F = (c1 cj cn), with cj = bj-1, for every j of J) Let E = 2I = P(I) be the set of subsets of I.

Let d: I B [n] be a mapping which associates to every individual i of I its description d (i) = (d1 (i) dj (i) dn (i)), where dj (i) is the value of the variable j for the individual i.

Let f: E B[n] be the mapping which associates to every subset X of I the element f (X) of B [n] defined by f (X) = 1F; X = , and f (X) = i X d i = d (i1) d (i2) . d (ik) if X = {i1, i2 ik} I. (f (X) is called the intent of X).

We define the mapping g: B [n] by g (Zn) = {i I: Zn d (i)}, for every Zn de B[n]. (g (Zn) is called the extent of Zn). We note that f and g are decreasing mappings and their composites h = g ° f and k = f ° g are closure operators.

In the literature of « data mining », the triplet C = < I, F, d > is called a context. It is often materialized by a table, which has m lines and n columns, every element in position (i, j) being equal to dj (i).

The pair (f, g) is called the Galois connection associated to the context C. The elements X of E such as h (X) = X are called the closed elements of E, and Zn of B[n] such as k (Zn) = Zn are called the closed elements of B[n].

Let us note Inv (E) = {X E: h (X) = X} and Inv (F) = {Zn F: k (Zn) = Zn}. We can prove that these sets are in bijection [ 4 ].

The set of pairs (X, Zn) of E x B [n] such as (X Inv (E) and Zn = f (X)) is equal to the set of all the pairs (X, Zn) such as (Zn Inv (F) and X = g (Zn)). (Each of these pairs is called a concept). This set is called the Galois Lattice associated to the context C, and noted TG (C).

TG (C) = {(X, Zn): X Inv (E) and Zn = f (X)} = {(X, Zn): Zn Inv (F) and X = g (Zn)}.

We can prove that this set is a lattice for the relation of order defined by (X, Zn) (X , Zn ) iff X X and Zn Zn.

Many problems of data mining are related to the determination of the Galois Lattice associated to a context C. Depending on the nature of the data mining problem, this determination may consist in determining simply all the concepts (X, Zn), or to determine this set and its order relation.

3 Lexicographical Order on B[n]

Up to here, B[n] is provided with the order associated with B1 x B2 x x Bn. If all the bj s are equal to 2 (binary case), the Ganter algorithm allows to build all the closed itemsets of the Galois lattice following the lexicographical order of its elements.

In order to generalize the Ganter algorithm, we define, in this section, this order on B[n] when the bj s are greater than 1.

3.1 The lexicographical order definition

Being given two elements Xn and Yn of B[n], our objective is to determine their greatest common prefix. Two cases are possible: either Xn = Yn, or they are different. If j = inf {k/xl yl } of and if xj < yj then Xn precedes Yn in lexicographical order (L.O), and otherwise Yn precedes Xn. We note Xn Yn the fact that Xn precedes Yn in L.O. In pseudo-Pascal, the following function takes the value True iff Xn Yn.

Function inflex (Xn, Yn: elements of B[n]): boolean;

Var j: integer;

Begin End; j = 1; while ((j < = n) and (x [j] = y [j])) do j = j+1;

Inflex= (j > n) or (x [j] < y[j]);

3.2 The next in lexicographical order of Xn

While 1F = (c1 cj cn), we define the transition subscript of every Xn of B[n] as follows: j = n; While ( (j > 0) and (x [j] = c [j]) ) do j = j - 1; Subscript of transition= j; In other words, if Xn = (x1, x2 xi-1, xi, ci+1, ci+2 cn), and xi < ci, then its transition subscript is i.

This subscript is designated by i+(Xn). Therefore, i+ (Xn) = 0, iff Xn = 1F. Moreover, if Xn 1F, and if its transition subscript is i, we define its following Yn in lexicographical order by Yn=(x1,x2 xi-1, 1 + xi, 0, 0 0)., and we note it Xn+. (It is easy to verify that Yn follows of Xn in L.O).On the other hand, if Xn = 1F, the following of Xn in L.O is not defined.

We can also define, the next of Xn in L.O, from the subscript i (of J), as follows: Procedure next (X: element of B [n]; i: element of J; the Var Y: element of B [n]; Var i+: integer)

Var j, k: integer;

begin j = i; end; end; While ((j > 0) and (x [j] = c [j])) do j = j-1; i+ = j; If (i+ > 0) then begin

For j = 1 to i+ - 1 do y [j] = x [j]; y [i+] = 1 + x [i+];

For j = 1 + i+ to n do y [j] = 0; Remark: when i is the transition subscript of X, this procedure output is Y = X+, the next of X in L.O, which may be generally provided by starting with i = n. For every X = Xn = (x1, x2 xi-1, xi, ci+1, ci+2 cn), other than 1F, we define the element X* of B [n] by: X* = (x1, x2 xi-1, ci, ci+1, ci+2, cn), if i+(X) = i. Thus, we remark that X X+ X*. for every X of B[n], other than 1F, In the next sub-section, we establish several relations between product-order and lexicographical order.

3.3 Product-order and lexicographical order on B [n]

Proposition 1: X and Y of B [n], if X Y then X Y Proof: For every j of J xj yj. If Y X, j of J such as yj < xj. This is in contradiction with the fact that X Y.

Proposition 2: whatever are X and Y of B [n], if X is different of 1F, then: X+ Y X* iff X+ Y X*.

Proof: It is obvious that X, X+ X* and X+ X*. The implication from left to right results from 2.3.1.

Let us prove the second implication. We note i = i+(X) the transition subscript of X. Since X+ Y, - Either X+ [j] = Y [j] for every j of J .i.e. X+ = Y. Therefore, we conclude in this case that X+ Y and Y X*; - Or there is a subscript k of J such as X+[k] < Y [k]. If k < i = i+[X], we have every j < k, X+[j] = Y[j], and X+[k] < Y[k]. But for every j < i, we have X+ [j] = X* [j] = X[j]. It follows that X* Y, in contradiction with the hypotheses. In that way, i = i+ [X] k. Consequently, for every j < i: X+[j] = X*[j] = X [j] = Y [j]. Furthermore, for every j > i, we have X+[j] = 0 Y [j] cj = X*[j].

If X+ Y X*, we have thus for j = i, X+[i] = 1 + X[i] Y[i] ci = X*[j]. This proves that for every j of J, X+[j] Y [j] X*[j], and thus that X+ Y X*.

3.4 Next closed itemset according to lexicographical order

For a given context C = < I, F, d >, with F = B [n], and an element a of F, we search the first element y of F which is a closed itemset of the lattice TG (C) and such as a y.

The following result generalizes the Ganter s one.

Proposition 1: Let a+ be the following of a in L.O. We pose X = g (a+) I, and y = k (a+) = f (X) F.

If a+ y a*, then y is the first closed itemset, of TG (C), following a in L.O; Otherwise, there is no closed item z of TG (C) between a+ and a*, and the following closed itemset of a in L.O is to be searched from the next of a*.

Proof: Since k is extensive, a+ k (a+) = y. - We suppose that we have furthermore y a*. Then a+ y a*.

According to 2.3.2, a+ y a*. y is a closed item which follows a in L.O. Let us prove that it is the first one. If there was one closed item z of TG (C), such as a+ z a*, then a+ z a*. Since k is increasing, we have y = k(a+) k(z) = z. Therefore y z, and according to 2.3.1, y z.

Consequently, y is the first closed item of TG (C) following a in L.O. - On the other hand, if a* < y, let us prove that there is no closed item z of TG (C) such as a+ z a*. If a such z existed, then a+ z a*. Since k is increasing, we have y = k (a+) k (z) = z. Finally, since z a*, we have y a*. This result is in contradiction with the hypothesis.

4 Ganter algorithm

The properties established previously lead to the first version of the Ganter algorithm for contexts C = <I, F, d > such as F = B [n] provided with the product-order.

Procedure GANTER1 (C: context)

Var a, a+, a*, y: elements of F; X element of E; nf: integer; {nf = number of closed itemsets of the lattice} Begin nf = 0; a=OF; X= g (a); y= f (X); If (y = a) then begin nf=1+nf; show (X, y); end; While (a < 1F) do Begin a = a+; X = g (a) = f (X);

If (y a*) then

begin

nf = 1 + nf; show (X, y); a=y; end; else a = a*; End;

End; We can reduce the number of operations of this algorithm by taking into account the properties established above: If i denotes the transition subscript of a, it obvious that that

By assigning a to OF, we have i = n; The condition (a < 1F) is equivalent to (i > 0); The expression a = a+ may be replaced by the expression next (a, i, a+, i+); i = i+ , which provides a+ as well as its transition subscript;

The expression If (y a*) may be replaced by the expression (If yj = aj for j =1 i-1) .

The expression a = a* may be replaced by the expression i = i-1 . That leads to a faster version of the Ganter (?) algorithm.

Procedure GANTER2 (C: context)

Var a, a+, a*: elements of F; X element of E; i, i+, nf: integer; {nf = number of closed itemsets of the lattice} Begin nf = 0; a=OF; i=n; X= g (a);y = f (X); If (y = a) then

begin nf=1+nf; show (X, y); end; While (i > 0) then begin next (a, i, a+, i+); a = a+; i = i+; X = g (a); y = f (X); If (for every j < i we have yj = aj) then

begin nf =1+nf; show (X, y); a = y; i = n; end; else i = i-1; End;

Example:

F = F1 x F2 x F3

F1 : Size: short (1), medium (2), high (3) F1 will be presented by 1 < 2 < 3 (ordered set by the relationship ) F2 : Weight: thin (0), fat (1) F2 will be presented by 0 < 1 (ordered set by the relationship ) F3: Age: child (1), adolescent (2), adult (3) F3will be presented by 1 < 2 < 3 (ordered set by the relationship )

Marc Cédric Céline Carine

In this section, we use the sequential character of the construction of TG (C) by the Ganter algorithm to split this construction process into many parts.

5.1 Construction of an interval of TG(C)

Let u and v be two elements of B [n] such as 0F u v 1F. We recall that the Ganter algorithm produces the closed itemset (X, y) of TG (C) in the lexicographical order of the elements y of B [n]. Therefore, if we note TG (C, u, v) the set of closed itemset (X, y) of TG (C) such as u y v, and if we call it the interval [u, v[of TG(C), then it is possible to obtain this interval by applying the following procedure: Procedure GANTER_TRUNCATED (C: context; u, v: elements of B [n]; Var TG: set of pairs of (X, y) of E x F) Var nf: integer; a, a+, a*, y: elements of F; X: element of E; Begin

TG= ; nf: 0; a=u;

X=g (a); y = f (X); If (y = a) then begin nf= 1 + nf; show (X, y); TG= TG

While (a v) do begin {(X, y)}; end; a = a+; X = g (a); y = f (X); If (y a* and y v) then begin nf =1+nf; show (X, y); a=y; TG = TG {(X, y)}; end End; We think that it is possible to provide a more efficient version of this procedure by taking into account the same remarks as for Ganter2.

5.2 Distributed construction of TG(C)

Let (u[0], u[ 1 ] u [p-1], u[p]) be a sequence of elements of B [n], strictly increasing according to the lexicographical order, and such as OF = u[0] u[ 1 ] u[p-1] u[p] = 1F.

In the rest of the paper, such a sequence is called a partition of B [n].

TG (C) may be built as the union of the TG (C, u [k-1], u [k]), for k =1, 2 p. In that way, we obtain all the closed itemset (X, y) of TG (C) such as Y [0F, u[ 1 ][ [u[ 1 ], u[ 2 ][ [u[p-1],1F [.

Let us remark that this construction excludes the pair (X, y) such as y = 1F. 1F is a closed item, since for every y of F we have y k (y). Thus,1F k (1F). Nevertheless k (1F) is an element of F and thus k (1F) 1F. That confirms that y = 1F is a closed item. Furthermore g (1F) = {i I: 1F =d (i)} = {i I: 1F = d (i)}. Consequently TG (C) = TG (C, u[0], u[ 1 ]) TG(C, u[p-1], u[p]) {(g (1F), 1F)}.

Besides, we obtain a procedure to build TG (C), all the pairs of closed itemsets of the Galois lattice associated to the context C.

Procedure GANTER_PARTITION (C: context; u: partition of B [n]; Var TG: ensemble of pairs (X, z) of E x F) Var k: integer; Begin End;

TG= ; For k = 1 to p do TG=TG TG= TG {(g (1F)), 1F)};

TG(C, u [k-1], u [k]); The value of TG provided by this algorithm is TG (C).

6 The choice of a partition

We often use the segmentation of B [n] because this set is with strong cardinality. Moreover, since the algorithm of Ganter consists essentially in making a path in L.O of this set, we can intend to split this path in many separate intervals to be computed by different processors. Therefore the problem to be solved is equivalent to use partitions (u[0] u[p]) which are as adequate as possible. A partition is called adequate if it allows sharing the workload by taking into account the capacity of the various available processors.

Let us note b[n] = | B [n] | the cardinality of B [n] = b1. b2 bn. If there are p processors of the same capacity, we can, for example, intend to allocate to each processor, the exploration of an interval [u[k-1], u[k][of B [n] (with length = b[n] / p). Consequently, the total workload between processors will be equally distributed.

It is also possible to design partitions which attribute to the processors the exploration of intervals with amplitudes proportional to their respective powers. From this point of view, the choice of partition, proposed in [ 11 ], is unbalanced.

Let us recall this choice proposed in the binary case:

We have bi = 2, for every j of J = {1 n};

For every k of J, n,k the vector of B [n] = B1 x B2 x x Bn = {0, 1}n, where all the constituents are null, except the k-th which is equal to 1.

Let us denote u [k] = n,n-k+1, for k = 1, , n, u [0] = OF = (0, , 0), u[n+1] = 1F = (1, , 1), and p = n+1.

Then, the sequence (u[0], u[ 1 ], u[ 2 ] u[p-1], u[p]) is lexicographically increasing, and it can be used to partition B[n]. It is easy to note that: - The number of elements of B [n] following n,k in L.O , is 2n - 2n-k, for k = 1, , n; - The number of elements of B [n] following u[k] = n,n-k+1 in L.O, is thus 2n - 2k-1; So that the number of elements of B [n] contained in L.O in the interval [u[k-1], u[k][ is: n, k = (2n - 2k-2) - (2n - 2k-1) =2 k-2, for k = 2, , n+1.

Then n, k = 2. n, k-1. That indicates that the amplitudes of the intervals of the partition are in a geometrical progress of reason 2. Every interval being twice as large as the precedent, the distribution of the tasks is extremely unbalanced. If the first processor has to do a certain work, the second will have to do twice as much, the third four times more, etc. We also note that the last processor has to do the half of the total work, the last but one processor the quarter of the work.

In that way, any other partition which uses only a part of the n,k would be also unbalanced. In the next section, we present some mathematical tools that will help us to build better-balanced partitions.

7 Mathematical tools 7.1 Rank of an element of B [n] in lexicographical order

In this section, we assume that all the bj can be different. This corresponds to the general case. Our goal is to define a ranking function which permits associating to each element Xn of B[n] its rank in B[n] in lexicographical order. Then we shall prove that this function defines an isomorphism of ordered sets.

B [n] and Xn may be defined recursively as following:

B [n] = B1, if n = 1, and B [n] = B [n-1] x Bn, if n > 1.

Any element Xn of B [n] by Xn = (x1), if n =1, and Xn = (Xn-1, xn)

n > 1.

In the same way, we define recursively the mapping n: B [n] [n]:

B [n-1] x Bn, if

N: for every Xn of B

If n = 1, then n (Xn) = xn, and if n > 1, then n (Xn) = xn + bn . n-1 (Xn - 1). Let us prove the following property: Property 7.1: n is a bijection between B [n] and the set [0, b [n] - 1].

Proof: we proceed by induction on n > 0.

The property is true for n = 1. Let us suppose that we have established it until n-1. Let us show that it is true for n. - Let us begin by showing that for every Xn of B [n], we have 0 n (Xn) b [n] - 1. Indeed, by hypothesis of induction (I.H, in summary), we have 0 n-1 (Xn-1) b [n-1] 1. On the other hand since 0 xn bn - 1, we have 0 + bn . 0 = 0 n (Xn) bn -1 + bn. (b [n-1] - 1) = bn . b [n-1] 1 = b [n] - 1. - Let us prove that n is injective. If two elements of B [n] are Xn and Yn such as n (Xn) = n (Yn) We thus have n (Xn) = xn + bn. n-1 (Xn-1) = n (Yn) = yn + bn. n-1 (Yn-1). If n-1 (Xn-1) < n-1 (Yn-1), we have:

xn - yn = bn. ( n-1 (Yn-1)) - n-1 (Xn-1)) bn - 1 = bn.

This is impossible, because xn - yn xn bn -1.

In the same way, we cannot have n-1 (Xn-1) > n-1 (Yn-1). It follows that n-1 (Xn-1)= n-1 (Yn-1),. Thus xn = yn. Now, by I.H, n-1 is injective. Consequently, Xn-1 = Yn-1, and Xn= (Xn-1, xn) = (Yn-1, yn) = Yn. - Let us prove that n is surjective. It is necessary to show that for every integer a, such as 0 a b[n] - 1, there is an element Xn of B [n] such as n (Xn) = a. According to the Euclidian division of r by bn, there is a unique pair of naturel integers (q, r) such as a = bn . q + r, with r < bn. Furthermore 0 q (b [n] -1) / bn, 0 q b [n-1] -1. Let us define xn = r. By I.H, n-1 is surjective. Thus, there is an element Xn-1 of B [n-1] such as n-1 (Xn-1) = q,. Furthermore, we have 0 xn bn - 1. So Xn = (Xn-1, xn) is an element of B [n], and we have n (Xn) = xn + bn . n-1 (Xn-1) = r + bn. q = a . Q.E.D. The following procedures written in pseudo Pascal allow computing the function rank: B [n] N as well as its inverse Expansion: N B [n]. Knowing n, b1 bn, and Xn de B [n], the first one computes n (Xn). For every integer a such as 0 a b [n] 1, the second procedure, determines the element Xn of B [n] such as n (Xn) = a.

Function RANK (n, b1 bn: integer; Xn: element of B [n]): long integer; Var i: integer; a: long integer; Begin a= 0; For i= 1 to n, do a= x [i] + a* b [i];

Rank=a; Remark: The inverse function of rank is called expansion, because it provides the expression of the integer a in the multiple or heterogeneous base (b1, b2 bn). This writing is Xn = (x1, x2 xn).

Reciprocally, every Xn of B[n] can be viewed as the writing in base (b1 bn) of the integer a which is equal to n (Xn).

Remark 2: the function rank was defined in a recursive way. We can need an explicit formulation which can be obtained by using the following formula:

End;

Procedure EXPANSION (n, b1

B[n]); Var s: long integer; i: integer; Begin s =a; For i= n down to 1 do begin End; end;

X [i]= s mod b [i]; s=s div b [i]; bn: integer; a: long integer; var X: element of n (Xn) = xn + bn . xn-1 + bn. bn-1. xn-2 + + bn. bn-1 b2. x1.

7.2 Isomorphism of ordered sets

Proposition : n is an isomorphism between B [n] provided with lexicographical order , and [0, b [n] - 1] provided with the order of the naturel integers.

Proof: By induction on n. The property is obvious for n = 1. Let us suppose we have proved it until n-1. Let us prove that it is true for n. We have Xn = (Xn-1, xn), and Yn = (Yn-1, yn). If we suppose that Xn Yn. then we have n(Xn) n(Yn). Indeed we have Xn Yn if (Xn-1 Yn-1), or (Xn-1 = Yn-1 and xn < yn). - in the second case, we have n (Xn) = xn + bn. n-1 (Xn-1) < yn + n-1 (Xn-1) = n (Yn). - In the first case, by I.H, we have n-1 (Xn-1) < n-1 (Yn-1) , and therefore n (Yn) - n (Xn) = bn. ( n-1 (Yn-1) - n-1 (Xn-1)) + yn - xn bn . 1 + yn - xn. Since 0 xn, yn bn - 1, we can conclude that n (Yn) - n (Xn) > 0. So Xn Yn n (Xn) n(Yn). Reciprocally, let Xn and Yn of B [n] such as n (Xn) n (Yn). Let us show that Xn Yn.

Xn-1, then, by I.H, n-1 (Yn) < n-1 (Xn-1). In that way, we have n (Xn) - n bn-1 + x n - yn > 0. This is in contradiction - If Yn-1 (Yn) = bn. ( n-1 (Xn-1) - n-1(Yn-1)) + xn - yn with the hypothesis. - If Xn-1 = Yn-1 and yn < xn, we still have then for every integer a such as 0 a b[n] 1 for every integer a such as 0 a b[n] - 1 n (Xn) - n (Yn) > 0, in contradiction with the hypothesis CQFD.

7.3 Addition of two numbers written in a multiple base

Let (b1, b2 bn) be the multiple base and x, y two natural integers such as 0 x, y b [n] - 1. The respective expressions of these two numbers in base (b1, b2 bn) are Xn and Yn. What is the expression of z = x + y in this base? In a different way: if we know the writings Xn and Yn of two numbers x, y in base (b1...bn), without knowing x and y, is-it possible de compute the expression Zn of z, without having to compute z=x+y ? The solution consists in making the addition of Xn and Yn in the base. The following procedure implements this operation. It uses an auxiliary sequence of nature integers r0, r1 rn which play the role of the "carries".

Procedure ADDITION (n, b1, , bn: integer; X, Y: elements of B[n]; var Z: element of B[n]; var r: integer); Var r[0..n]: integer; i: integer; u:entier; Begin r[n]= 0; For i = n down to 1 do begin

u = x[i] + y i] + r [i]; If (u < b [i]) Then begin z [i]= u; r [i-1]= 0; end

else begin z [i]= u - b [i]; r [i-1]= 1; end; End; In this procedure, r plays the role of the final carry. The addition of Xn and Yn in base (b1 bn) will be noted Xn Yn.

Let us show that the procedure ADDITION provides the expected result.

For i = n, n-1, 1, we add xi and yi and the carry ri. Since for every i we have 0 xi, yi bi - 1, and ri < 2, we obtain that 0 u = xi + yi + ri 2bi - 1. If 0 u bi - 1, we put zi = u, and ri-1 = 0; and else zi = u - bi, and ri-1 = 1. Therefore, for every i of 1 n we have 0 zi bi - 1, and 0 ri 1. The final carry r = r0, is also equal to 0 or 1. Since for every i of {1 n} 0 zi bi - 1, Z = (z1 zn) belongs to B[n].

Xn Yn. is equal to the output Z, of the procedure of addition

Now we establish the following proposition:

Proposition 7.3: Whatever are the elements Xn and Yn of B [n], we have n (Xn n (Xn) + n (Yn) - r0 . b[n], with r0 = 0 or 1.

Proof: At every step i of the procedure ADDITION, we have noted that: If u < bi, then zi = u and ri-1 = 0;

else zi = u-bi, and ri-1 = 1; end; Therefore for every i, we can write that zi = u - bi . ri-1, zi = xi + yi + ri - bi. ri-1. Yn.) =

8 Construction of good partitions

The tools, which we introduced above, will help us to build any partition of B[n]. Let us suppose that we have p processors to determine TG (C), knowing that F = B[n]. Suppose that these processors are of respective capacities c1, c2 cp, (ck = number of closed that the processor k can build in a given lapse of time), and that c1 + c2 + cp b [n] = b1.b2 bn. We can build a sequence of integers a0, a1 al, in the following way: l: 0; al = 0; while (al < b[n] - 1) do begin l = l +1; al = al-1 + cl; end; al = b [n] - 1.

In that way, we obtain a strictly increasing sequence (a0, a1 al) such as a0 = 0, and al = b [n] - 1. Furthermore for k = 1 l, we have ck = ak - ak-1.

Then, for k = 0, 1 l, we build the elements uk of B[n], by computing EXPANSION (n, b1 bn, ak, uk), which provides the writing uk of ak in base (b1 bn). This sequence (u0, u1 ul) constitutes a good partition of B [n], since it is lexicographically strictly increasing one, with u0 = OF, ul = 1F, and since it distributes the workload by taking into account the capacity of all the processors. However, although mathematically correct, this method is difficult to implement. Indeed, when n is large, (even for n = 30), the number b [n] = b1.b2 bn is equal at least to 2n, and it can reach very large values, which are hard to compute with the current processors. Therefore, the procedure EXPANSION (n, b ak, uk) is difficult to execute with big values of the parameter ak.

To overcome this difficulty we can use additions in base (b1 bn) as in the following case of equi partition: Let us suppose that all the processors have the same capacity c. We determine the smallest integer p equal or bigger than b[n] / c.

Then we determine the expression uc of c in base (b1 bn), by executing EXPANSION (n, b1 bn, c, uc). Finally, we build the sequence of elements uk of B [n] by the mean of the following instructions: k = 0; u [k]= 0F; r = 0; while (r = 0) do begin k=k+1;

ADDITION (n, b1, end; p = k; u [p]= 1F;

, bn, u [k-1], uc, u [k], r); This procedure returns u0 = 0F; u1 = u0 uc = uc, u2 = u1 uc until the carry r becomes equal to 1.

While r = 0, by the property 5.3, we have n (Xn Yn) = n (Xn) + n (Yn) . Thus the addition of writings in base (b1 bn) is equivalent to the addition of the corresponding numbers.

By this way we built the sequence (u0, u1 up) of the partition, without computing (a0, a1 ap) ranks ak = n (uk). We stop when r takes the value 1. One then take p = k, and we set up = 1F.

9 Conclusion

We have proposed a generalization of the Ganter algorithm, as well as a distributed version of this algorithm.

On the one hand, this generalization allows us to determine the Galois lattices associated to rather general contexts, without needing to re-code the data into binary values. On the other hand, when one analyze the relationship between product-order and lexicographical order on the Cartesian product B[n] = B1 x B2 x x Bn, with Bj={0, bj -1}, for j=1 n, this generalization seems to be natural. Moreover, viewing the elements of B[n] as expressions of integers in base (b1 bn) allows us to obtain good partitions of B[n] and to simplify the calculations. From a practical point of view, we can intend to apply the procedure of segmentation of B[n] to very large contexts. This approach seems more efficient than the context-based approaches proposed in [ 1 ] and [ 14 ] which need to compute the Cartesian product of two Galois lattices. Finally, let us note that the algorithm proposed in this paper is based on a lexical search through the space B[n] of the attributes. While being more general than [ 11 ], it cannot determine the most general Galois lattice. To reach this target, we need algorithms which search throw the set P(I) of all subsets of individuals. This could be obtained either by searching this space in lexicographical order, or by obtaining a partition of this space and to share the workload of solving the corresponding sublattices between several processors. This is a possible future research direction.

1. Baklouti

, Lévy

. Parallel algorithms for general Galois lattices building . Proc. WDAS 2003 .

2. Birkhoff . G: Lattice Theory. American Mathematical Society , Providence, RI, 3rd edition . ( 1967 )

Emilion. R.

Lambert . G, Lévy. G, Algorithmes pour les treillis de Galois, IndoFrench Workshop., University Paris IX-Dauphine. 1997 .

4. Diday . E, Emilion. R : Treillis de Galois maximaux et capacités de Choquet. C.R. Acad .Sci. Paris, t. 325 , ( 1997 ) Série I . p. 261 - 266 .

5. Ganter , B: Two basic algorithms in conceptual analysis . Technical report , Darmstadt University ( 1984 ).

Lambert ,

Emilion , G. Lévy : Algorithmes pour construire les treillis de Galois généraux . To appear.

7. Ganter . B: Two basic algorithms in concept analysis . Preprint 831 , Technische

Hochschule Darmstadt

( 1984 ).

8. Ganter . B, Wille . R: Formal Concept Analysis . Mathematical Foundations , Springer. ( 1999 ).

9. Ganter . B : Formal Concept Analysis: algorithmic aspects . TU Dersden . ( 2002 ).

10. Godin . R : Complexité de structures de treillis. Ann. SC. Math., ( 1989 ) Quebec, 13 ( 1 ): 19 - 38 .

11. Huaiguo

, Engelbert Mephu Nguifo: A fast scalable algorithm to build closed item sets for large data . Third IASTED International Conference on Artificial Intelligence and Applications ( 2003 ).

12. Nourine . L., Raynaud . O: A fast algorithm for building Lattices . Information Processing Letters 71 , ( 1999 ) 199 .

13. Njiwoua . P, Mephu Nguifo . E: A parallel algorithm to build concept lattice . In Proceedings of 4th Groningen Intl. Information Tech. Conf. For Students , ( 1997 ) pp. 103 - 107 .

14. Valtchev

, Missaoui

, Lebrun

, A partition based approach toward construction Galois (concept) lattices . Discrete Mathematics 256 ( 2002 ) 801 - 829 .

15. Wille . R: Concept Lattices and Knowledge Systems . Computer Mathematic Applied, 23 ( 6-9 ), ( 1992 ) 493 - 515 .