=Paper=
{{Paper
|id=Vol-547/paper-17
|storemode=property
|title=Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical Partition
|pdfUrl=https://ceur-ws.org/Vol-547/91.pdf
|volume=Vol-547
|dblpUrl=https://dblp.org/rec/conf/ciia/BenmessahelT09
}}
==Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical Partition==
Appling A Discrete Particle Swarm Optimization
Algorithm to Database Vertical Partition
Bilal Benmessahel1, Mohamed Touahria1
(1) département d'informatique, Université Ferhat ABBAS- SETIF
bilal.benmessahel@gmail.com
Abstract. Vertical partition is an important technique in database design used
to enhance performance in database systems. Vertical fragmentation is a
combinatorial optimization problem that is NP-hard in most cases. We propose
an application and an adaptation of an improved combinatorial particle swarm
optimization (ICPSO) algorithm for the vertical fragmentation problem. The
original CPSO algorithm [3] suffers from major drawback—redundant
encoding. This paper applies an improved version of CPSO that using the
restricted growth (RG) string [5] constraint to manipulate the particles so that
redundant particles are excluded during the PSO process. The effectiveness and
efficiency of the improved CPSO algorithm are illustrated through several
database design problems, ranging from 10 attributes/8 transactions to 50
attributes/50 transactions. In all cases, our design solutions match the global
optimum solutions.
Keywords: Database vertical partition, Particle swarm optimization, RG String,
Genetic algorithms, Optimization.
1 Introduction
Vertical partition (also called vertical fragmentation) is the problem of clustering
attributes of a relation into fragments for subsequent allocation. The technique is used
to minimize the execution time of user applications that run on these fragments.
Vertical partition provides an important technique for designing distributed database
systems. Compared to other types of data fragmentation, vertical partition is more
complicated than horizontal partition because of the increased number of possible
alternatives [1].
Vertical partition algorithms contain two essential parts: the optimization method
and the objective function. Ozsu and Valduriez [1] argue that finding the best
partition scheme for a relation with m attributes by exhaustive search must compare at
least the mth Bell number of possible fragments, which means that such an algorithm
has a complexity of O(mm). Thus, it is more feasible to look for heuristic methods to
seek optimal solutions. On the other hand, database partition aims at enhancing the
transactional processing in database. The objective function evaluates whether such a
goal is achieved.
2 Bilal Benmessahel1, Mohamed Touahria1
Most previous algorithms employ multiple iterations of binary partition to
approximate m-way partition. Navathe, Ceri, Wiederhold, and Dou (1984) propose
the Recursive Binary Partition Algorithm (RBPA), which extends Hoffer and
Severance’s work by automating the selection process of vertical fragments; they
propose some empirical objective functions. Cornell and Yu (1990) adopt the same
approach but replace the empirical objective functions with one constructed on a
model database. Chu and Ieong (1992) adopt transactions as units in their algorithm;
however, it is still a binary partition approach.
Efforts have also been made to use other optimization techniques to benefit vertical
partition. Hammer and Niamir (1979) propose a hill-climbing method that
alternatively groups and regroups attributes and fragments to reach a suboptimal
solution. Song and Gorla (2000) solve the problem with GA. However, each run of
their GA only gets a binary partition. Therefore, the GA only provides the
intermediate results in a recursive process. Our aim in this paper is to propose a pure
PSO solution to vertical partition. By pure we mean the direct result from the PSO
execution is already an m-way partition.
The work described in this paper considers the vertical partition problem and reports a
Combinatorial PSO application that can eliminate the encoding redundancy by using a
restricted growth (RG) string [3] constraint in constructing particles. To evaluate its
effectiveness and demonstrate its superiority, we compare the result of using the
improved CPSO with that of using traditional GA as well as RG string encoding with
traditional object based GA operators called SGA and another GA based algorithm
called Group oriented Restricted Growth String GA GRGS-GA developed to solve
the vertical partition problem by Jun Du and Al (2006) [2].
The balance of the paper is structured as follows. Section 2 introduce the partition
evaluator (PE) developed by Chakravarthy and al [4]; this evaluator will be used as
the fitness function for the proposed approach and the two other approaches used in
experiments. Section 3 presents the particle swarm optimization with a general brief
overview RG strings. Section 4 we develop an improved particle swarm for vertical
partition problem. Section 5 presents the application of the improved CPSO algorithm
to the vertical partition problem and compares the proposed approach with the other
two approaches SGA and GRGS-GA; and it is demonstrates that the improved CPSO
can effectively find optimal solutions even for large vertical partition problems.
Section 6 is summary and conclusions.
2 Objective functions for vertical partition
The two kinds of objective functions used for partition algorithms are 1) model cost
functions based on the transaction access analysis on a model DBMS and 2) those
based on an empirical assumption. The former form of objective function is specific
to the underlying DBMS while the latter is more general and intuitive [2].
In addition to the AUM used as input for both types of objective functions, the
model cost function takes into account the specific access plan chosen by the query
optimizer, e.g., the join method and the type of scan on the relation by each
Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical
Partition 3
transaction type. Without this additional information, the empirical cost objective
function only shows the trends in the cost that are affected by the partition process.
However, it is useful for the logical design of a database when information about
physical parameters may not be available. Although less precise than the model cost
functions, they can be very effective in comparing different optimization techniques
used by algorithms.
In this paper, we use an empirical objective function, a modified version from the
partition evaluator proposed by Chakravarthy et al. (1992). This partition evaluator
uses the square-error criterion commonly applied to clustering strategies. We thus
name it the Square-Error Partition Evaluator (SEPE). The SEPE consists of two major
cost factors: the irrelevant local attribute access cost and the relevant remote attribute
access cost. They represent the additional cost required, other than the ideal minimum
cost. Further, the ideal cost is the cost when transactions only access the attributes in a
costs are calculated using the square-error result; they are denoted
and ,
single fragment and have no instances of irrelevant attributes in that fragment. Both
respectively. More details about SEPE, including the formula, can be found in
Chakravarthy et al. (1992).
3 Particle swarm optimization
PSO introduced by Kennedy and Eberhart [8] is one of the most recent
metaheuristics, which is inspired by the swarming behavior of animals and human
social behavior. Scientists found that the synchrony of animal’s behavior was shown
through maintaining optimal distances between individual members and their
neighbors. Thus, velocity plays the important role of adjusting each member for the
optimal distance. Furthermore, scientists simulated the scenario in which birds search
for food and observed their social behavior.
They perceived that in order to find food the individual members determined their
velocities according to two factors, their own best previous experience and the best
experience of all other members. This is similar to the human behavior in making
decision, where people consider their own best past experience and the best
experience of the other people around them.
PSO algorithm
The general principles of the PSO algorithm are stated as follows. Similarly to an
evolutionary computation technique, PSO maintains a population of particles, where
each particle represents a potential solution to an optimization problem.
Let m be the size of the swarm. Each particle i can be represented as an object with
several characteristics.
represented by a n-dimensional vector, , and velocity
Suppose that the search space is a n-dimensional space, then the ith particle can be
, where i = 1, 2, ..., m.
4 Bilal Benmessahel1, Mohamed Touahria1
In PSO, particle i remembers the best position it visited so far, referred as
, and the best position of the best particle in the swarm, referred as
.
particle i adjusts its velocity
and position for each dimension j by referring
PSO is similar to an evolutionary computation algorithm and, in each generation t,
to, with random multipliers, the personal best position
and the swarm’s best
position , using Eqs. (1) and (2), as follows:
!
And
!
Where c1 and c2 are the acceleration constants and r1 and r2 are random real
and while still exploring new areas. Such stochastic mechanism may allow
numbers drawn from [0, 1]. Thus the particle flies through potential solutions toward
velocity of a particle, it was necessary to impose a maximum value "#$ on it. If the
escaping from local optima. Since there was no actual mechanism for controlling the
velocity exceeded this threshold, it was set equal to"#$ , which controls the
maximum travel distance at each iteration, to avoid a particle flying past good
solutions. The PSO algorithm is terminated with a maximal number of generations or
the best particle position of the entire swarm cannot be improved further after a
sufficiently large number of generations.
The aforementioned problem was addressed by incorporating a weight parameter
in the previous velocity of the particle. Thus, in the latest versions of the PSO, Eqs.
(2) and (3) are changed into the following ones:
' (
! )!
*!
( is called inertia weight and is employed to control the impact of the previous
history of velocities on the current one. Accordingly, the parameter ( regulates the
trade-off between the global and local exploration abilities of the swarm. A large
exploration. A suitable value for the inertia weight ( usually provides balance
inertia weight facilitates global exploration, while a small one tends to facilitate local
of the number of iterations required to locate the optimum solution. ' is a constriction
between global and local exploration abilities and consequently results in a reduction
factor, which is used to limit the velocity.
The PSO algorithm has shown its robustness and efficacy for solving function
value optimization problems in real number spaces. Only a few researches have been
conducted for extending PSO to combinatorial optimization problems.
Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical
Partition 5
4 An improved CPSO for Database Vertical Partition
In this paper, we propose an improved version of the combinatorial CPSO algorithm
[3] aimed at solving the Database Vertical Partition problem. The CPSO algorithm
suffers from major drawback—redundant encoding. This paper applies the restricted
growth (RG) string [5] constraint to manipulate the particles so that redundant
particles are excluded during the PSO process. The following section presents the
basics of the RG strings.
4.1 Basics of RG strings
The Restricted Growth (RG) string encoding represents a grouping solution as an
array of integers, denoted a[n], where n is the number of attributes in the relation. The
elements in the array may be integer values ranging from 1 to n. Meanwhile, as
constituents if RG string, they must satisfy Definition 1, given next. In addition to the
formal definition of RG string, other supporting extended definitions are presented
next:
Definition 1. A RG string r is a sequence of integers represented as an array, which
satisfies the following inequality:
+,-. / 012 +,3. +,4. +,- 4.! 4! 3 5 6 5 7 ,3. 4
For example, {1 1 2 3 1 1 2 4} is RG string, but {4 4 2 3 4 4 2 1} is not, although they
map to the same solution in random string encoding scheme.
Definition 2. The degree of RG string r is the largest value in r, denoted d(r).
For example, consider r = {11123221}, then d(r)=3.
Definition 3. The ith prefix of RG string r, denoted8 , is the substring that includes
the first i values of r.
For example, consider r = {11123221}, 89 = {1112}.
4.2 Definition of a particle
Denote by : ; ;
; the n-dimensional vector associated to the solution
taking a value in {-1, 0, 1} according to the state of solution
: is a dummy variable used to permit the transition from the combinatorial state
of the ith particle at iteration t.
to the continuous state and vice versa.
6 Bilal Benmessahel1, Mohamed Touahria1
? 6@
= 6@
: H I!
>A6@
!
=
< BACDEF6GE
4.3 Velocity
Let J ; be the distance between and the best solution obtained
Let J ; the distance between the current solution and the best
by the ith particle.
solution obtained in the swarm.
FK K K J K K J L!
The updated equation for the velocity term used in CPSO is then:
FK K K ; K K ; ! M!
With this function, the change of the velocity
depends on the result of; .
If , then; . Thereafter d2 turns to ‘‘0’’, and d1 takes “-2’’,
If , then; . Thereafter d2 turns to ‘‘2’’, and d1 takes ‘‘0’’,
thus imposing to the velocity to change in the negative sense.
The case where N and N , ; turns to ‘‘o’’, d2 is
thus imposing to the velocity to change in the positive sense.
equal to ‘‘1’’ and d1 is equal to “-1’’, thereafter the parameters r1; r2; c1 and c2 will
The case where and , ; takes a value in
determine the sense of the change of the velocity.
{-1,1}, thus imposing to the velocity to change in the inverse sense of the sign of ; .
4.4 Construction of a particle solution
The update of the solution is computed within; :
O ; P!
The value of ; is adjusted according to the following function:
6@O R S
; Q6@O 5 SH T!
BACDEF6GE
6@;
The new solution is:
H
Q 6@; B!
UU7JAV7WVXEACDEF6GE
Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical
Partition 7
; in the case of equality between
YZ[ allows to insure that the
The choice previously achieved for the affectation of a random value in {1, -1} for
variable ; takes a value 0, and to permit a change in the value of variable . We
define a parameter S for fitting intensification and diversification. For a small value
ofS, takes one of the two values
or (intensification). In the opposite
case, we impose to the algorithm to assign a null value to the; , thus inducing for
a value different from U7J (diversification). The parameters c1 and c2
are two parameters related to the importance of the solutions
U7J for the
generation of the new solution . They also have a role in the intensification of the
search.
4.5 Solution representation
In this subsection, we describe the formulation of the improved CPSO algorithm for
the database vertical partition problem.
One of the most important issues when designing the PSO algorithm lies on its
attributes. Each dimension represents an attribute and particle
solution representation. We setup search space of n-dimension for a relation of n-
corresponds to the affectation of n attributes, such that \ {1, 2, 3,…, k}, where k
is the number of fragments.
of the improved CPSO algorithm.
The scheme of Fig. 1 illustrates an example of the solution representation of particle
Let n = 7 attributes
]
9
^
_
`
k = 4 fragments.
1 2 1 3 4 2 2
Fig. 1. An example of solution representation with RG constraint.
4.6 Initial population
A swarm of particles is constructed based on RG string. In the particles generated, the
RG constraint is enforced from the beginning. No rectification process is needed after
all position in a particle are randomly created because each position is created as a
random integer between 1 and the high potential degree for its position, complying
with the RG string constraint. In the initial population, the first element is set to 1 and
the upper bound for each element increases gradually and this rarely reaches k-1,
where k is the number of fragments anticipated by the user
4.7 Creating a new solution
For creating new solution we use Eq(5). We obtain the vector value ; , The vector
of velocity computed with Eq(6). The new value of O is calculated using O
a
. a is determined using Eq(9) and using Eq (10) the new solution
8 Bilal Benmessahel1, Mohamed Touahria1
vector is determined. But the new solution can breaking the RG constraint for this
purpose we have designing the rectifier. The rectifier is a key issue in the proposed
approach that each particle is RG string.
However, the initialization of particles does not guarantee each particle to be RG
string. Also the operation of creating new solutions may change the constitution of a
particle the population in a way that violates the RG string constraint. To handle such
cases, we introduce a rectifying function that guarantees each particle to be RG string.
For particles that violate the RG string constraint, the rectifier simply scans through a
particle and converts it into RG string by adjusting the locations of its positions.
5 Implementation and experimental results
The algorithms have been implemented in java. All experiments with improved CPSO
and GRGS-GA [2] and SGA were run in Windows XP on desktop PC with Intel
Pentium4, 3.6 GHz processors. The GRGS-GA (Group oriented Restricted Growth
String) is GA based approach proposed by Jun Du and al [2] for database vertical
partition. The SGA is a Simple GA that uses random encoding schemes and classical
genetics operators.
We have dividing the experimental section into two phases. The test phase and the
comparison phase. In the test phase we have trying the improved CPSO on two cases,
the first case is an attribute usage matrix (AUM) of 10 attributes and 8 transactions
and the second case is an attribute use matrix of 20 attributes and 15 transactions.
In the comparison phase we have trying the improved CPSO with two others GA
based algorithm GRGS-GA and SGA on two larges cases generated pseudo randomly
with a pseudo random generator of AUM designed to generate a large size AUM.
5.1 Case 1: 10-attribute example
In this case, we use an attribute usage matrix AUM, with 10 attributes and 8
transactions. This AUM has already been utilized by other researchers as described in
Cornell and Yu, 1990, Navathe and al 1984, Suk-kyu Song and Narasimhaiah Gorla,
2000, J. Muthuraj and al 1993. J. Muthuraj and al found that for this AUM the best PE
value is 5820, which gives a fragmentation of 3 fragments {1 5 7} {2 3 8 9} {4 6 10}.
The improved CPSO find the best fragmentation in the 4 iteration, as illustrated in
the figure 2. And the algorithm is executed 10 times. Figure 3 shows the optimal costs
found in each trial. The above mentioned 3-fragment partition is evaluated to have a
PE value of 5820. So we argue that if the final partition of each trial has a PE value
less or equal to 5820, then such trial is considered a success. The success rate of the
improved CPSO is 100%. Another interesting statistic is the average number of
iterations needed to reach the optimal solution in the improved CPSO is 6.5.
Apparently, the improved CPSO performs well in terms of fitness and convergence
speed.
Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical
Partition 9
bcd ) * I L M P T B
b I B B B I B I B B B
b B IB IB B B B B IB IB B
b) B B B I B I B B B I
b* B )I B B B B )I )I B B
bI I I I B I B I I I B
bL I B B B I B B B B B
bM B B I B B B B B I B
bP B B I I B I B B I I
Table 1. 10–attribute Matrix
Optimal Costs-iteration graph
Thousand
17.46
Improved CPSO
11.64
Costs
5.82
0
1 6 11 16 21
Iterations
Fig. 2. PE values of best particles in each iteration
Thousand
Cost-Trial graph Improved CPSO
10
Costs
5
0
1 2 3 4 5 6 7 8 9 10 11
Trials
Fig. 3. Optimal solutions found in 10 trials case of 10 attributes
5.2 Case 2: 20-attribute example
The 20-attribute example has already been utilized by other researchers as described
in Chakravarthy et al. (1992), Navathe et al. (1984), and Navathe and Ra (1989). The
usage matrix used in this example describes the reference pattern of 15 transactions
accessing 20 attributes. The methods proposed in Navathe et al. (1984) and
Chakravarthy et al. (1992) both find the same optimal partition that groups 20
attributes into four fragments. In particular, Chakravarthy et al. (1992) decide based
on the PE value that this four-fragment partition is better than the five-fragment
partition found in Navathe and Ra (1989).
10 Bilal Benmessahel1, Mohamed Touahria1
The improved CPSO Algorithm is executed 100 times. Figure 5 shows the optimal
costs found in each trial. The above mentioned 4-fragment partition is evaluated to
have a PE value of 4644. So we argue that if the final partition of each trial has a PE
value less or equal to 4644, then such trial is considered a success. The success rate of
the improved CPSO is 100%. Another interesting statistic is the average number of
generations needed to reach the optimal solution in the improved CPSO is 30.4.
Apparently, the improved CPSO performs well in terms of fitness and convergence
speed.
Optimal Costs-Iteration graph
37152
Improved CPSO
32508
27864
Costs
23220
18576
13932
9288
4644
0
1 26 51 Iteraions 76 101 126
Fig. 4.The trends of PE values of best particles in each iteration. Case of 20 attributes
Optimal Costs-Trial Graph
8000
Improved CPSO
4000
Costs
0
1 Trials 50 99
Fig. 5. Optimal costs-trial graph for 20-attribute example
5.3 Case 3: 20-attribute pseudo random AUM
In this case we compare the improved CPSO algorithm with two others GA based
algorithms the GRGS-GA and SGA on pseudo random attribute usage matrix of 20
attributes and 20 transactions. This AUM generated using pseudo random AUM
generator designed to generate a large size usage matrix. The number of fragments
anticipated is 20 fragments. The value of the best fitness in this case is 0.
Appling A Discrete Particle Swarm Optimization Algorithm to Database Vertical
Partition 11
The average number of generations needed to reach the optimal solution in each
algorithm. They are 30.4, 45.6, and 296.2 for improved PSO, GRGS-GA GA and SGA,
respectively. Apparently, SGA performs worst among the three in terms of fitness and
convergence speed.
Cost-iteration graph
1200
thousand
Improved CPSO
1000
GRGS-GA
800
SGA
Cost
600
400
200
0
1 41 81 121 161 201 241 281
iterations
Fig. 6. The trends of PE values of best particles for Improved CPSO, GRGS-GA
G GA and SGA on
the 20-Attribute pseudo AUM.
5.4 Case 4: 50-attribute
attribute example
In this case, we try to apply the improved CPSO described in Section 4 to a large
size usage matrix. This AUM have 50 attributes and 50 transactions, the number of
fragments anticipated is 50 fragments. The value of the best fitness is 0.
The figure 7 shows that the improved CPSO reach the best fragmentation in 193
iterations.
Fig. 7. The trends of
PE values of best
particles for
Improved CPSO on
the 50-Attribute
pseudo AUM.
12 Bilal Benmessahel1, Mohamed Touahria1
5.5 Results analysis
As the number of attributes grows, the improved CPSO becomes harder to converge
because of the increased complexity of the partition problem.
Every improved CPSO trial finds the optimal partition known when the usage
matrix is generated. The convergence speed of the improved CPSO is well over the
two others GA based algorithms. So, the advantage of using the improved CPSO is
apparent over using the other two GAs. Figure 6 shows the PE values for 20 AUM.
Again from this graph, we conclude that the improved CPSO is the best among the
three and GRGS-GA is better than SGA in terms of the convergence speed.
6 Conclusion
Vertical database partition is a significant problem for database transaction
performance. In this article, we proposed a PSO-based solution. Particularly, this
solution features two new attempts: first, a RG string constraint is applied to
overcome the redundant encoding of previous GAs for the partition problem or
similar ones; second, a comparison is used to evaluate the performance of the
improved CPSO with to others GA based algorithms the GRGS-GA [2], and a simple
GA called SGA in term of convergence speed and best fragmentation results.
The success of using the improved CPSO to solve vertical partition problem suggests
it may be used to solve other clustering or grouping problems.
7 References
[1] Ozsu, M. T., & Valduriez, P. Principles of distributed database systems. Prentice Hall.
(1999).
[2] Jun du, Reda Alhajj, Ken Barker « Genetic algorithms based approach to database vertical
partition » Journal of Intelligent Information Systems Volume 26 , Issue 2 (March 2006)
Pages: 167 - 183 Year of Publication: 2006
[3] B. Jarboui, M. Cheikh, P. Siarry, A. Rebai: Combinatorial particle swarm optimization
(CPSO) for partitional clustering problem. Applied Mathematics and Computation 192(2):
337-345 (2007)
[4] Chakravarthy, S., Muthuraj, J., Varadarjan, R., & Navathe, S. B. (1992). A formal
approach to the vertical partition problem in distributed database design. Technical Report,
CIS Department, University of Florida, Gainesville, Florida.
[5] Ruskey, F. (1993). Simple combinatorial gray codes constructed by reversing sublists.
Algorithms and Computation, Lecture Notes in Computer Science 762, pp.201–208, Berlin
Heidelberg New York: Springer.
[6] Song, S., & Gorla, N. (2000). A genetic algorithm for vertical fragmentation and access
path selection. The Computer Journal, 43(1), 81–93.
[7] Navathe, S. B., & Ra, M. (1989). Vertical partition for database design: A graphical
algorithm. ACM SIGMOD Record, 18(2), 440–450.
[8] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of the IEEE
International Conference on Neural Networks, vol. IV, 1995, pp. 1942–1948.