<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QQuueerryy Ooppttiimmiizzaattiioonn bbyy GGeenneettiicc AAllggoorriitthhmmss</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suhail S. J. Owais</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Kr¨omer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Va´clav Sn´aˇsel Suhail S. J. Owais</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Kr¨omer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Va´clav Sn´aˇsel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          ,
          <addr-line>v7a0c8la3v3.Osnstarsaevla</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>VS</addr-line>
        </aff>
      </contrib-group>
      <fpage>125</fpage>
      <lpage>137</lpage>
      <abstract>
        <p>This study investigated the use of Genetic algorithms in Information retrieval in the area of optimizing a Boolean query. A query with Boolean logical operators was used in information retrieval. For Genetic algorithms, encoding chromosomes was done from Boolean query; where it was represented in the form of tree prefix with indexing for all terms and all Boolean logical operators. Information retrieval effectiveness measures precision and recall used as a fitness function in our work. Other Genetic algorithms operators were used as single point crossover on Boolean logical operators, and mutation operator was used to exchange one of the Boolean operators and, or, and xor with any other one. The goal is to retrieve most relevant documents with less number of nonrelevant documents with respect to user query in Information retrieval system using genetic programming.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Information retrieval system is used to retrieve documents that depend on or
relevant to the user input query. The growth in the number of documents in the
internet made it necessary to use the best knowledge or methods in retrieving
the most relevant documents to the user query. For this, Information Retrieval
has become a fact of life for most internet users.</p>
      <p>
        Information retrieval systems deal with data bases which is composed of
information items documents that may consist of textual, pictorial or vocal
information. Such systems process user queries trying to allow the user to access the
relevant information in an appropriate time interval. Where the art of searching
will be in the databases or hypertext networked databases such as internet or
intranet for text, sound, images or data, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus an information system has
at its heart a collection of data about reality [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Most of the information retrieval systems are based on the Boolean queries
where the query terms are joined by the logical operators AND and OR. The
similarity between a query and documents is measure by different retrieval strategies
that are based on the more frequent terms found in both the document and the
query. The more relevant document is deemed to be the query request. The most
frequently used measures of retrieval effectiveness are precision the percentage
of the retrieved documents that are relevant and recall the percentage of the
relevant documents that are retrieved.</p>
      <p>Information retrieval is concerned with collection and organization of texts,
responding to the requests of internet users for the information seeking text,
retrieving the most relevant documents from a collection of documents; and with
retrieving some of non-relevant as possible. Information retrieval is involved in:
– Representation,
– Storage,
– Searching,
– Finding documents or texts or images those are relevant to some
requirements for information desired by a user.</p>
      <p>Genetic Algorithms (GA) represents one of the artificial intelligence
algorithms that are attractive paradigm to improve performance in information
retrieval systems. Retrieving necessary documents from a collection of such
documents will be achieved using a query to select the most relevant documents. For
implementation of genetic algorithms will be on queries. A form of genetic
algorithm started to be applied in information retrieval systems in order to optimize
the query by genetic algorithms
2</p>
    </sec>
    <sec id="sec-2">
      <title>Genetic Algorithms (GA)</title>
      <p>
        Genetic algorithms (GA) first described by John Holland in 1960s and further
developed by Holland and his students and colleagues at the University of
Michigan in the 1960s and 1970s [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. GA used Darwinian Evolution to extract nature
optimization strategies that uses them successfully and transform them for
application in mathematical optimization theory to find the global optimum in
defined phase space [
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ].
      </p>
      <p>GA is used to find approximate solutions to difficult problems through a set
of methods or techniques inheritance or crossover, mutation, natural selection,
and fitness function. Such methods are principles of evolutionary biology applied
to computer science. GAs are useful for:
– Solving difficult problems.</p>
      <p>– Modelling the natural system that inspired design.</p>
      <p>Applying genetic algorithms over a population of individuals or chromosomes
shows that several operators are utilized. They are as follows (see Figure 1):
– Fitness operator, metrics to measure scheduler performance for each
chromosome in the problem, and calculate the values for each chromosome.
– Selection operator, select two chromosomes with the highest quality values
from the population, that couple to produce two offspring.
– Crossover operator, exchanges two subparts of the selected chromosomes,
the position of the subparts selected randomly.</p>
      <p>– Mutation operator, randomly changes the allele value in some location.
• Solving difficult problems.</p>
      <p>• Modeling the natural system that inspired design.</p>
      <p>Employing of genetic algorithms over a population of individuals or chromosomes:
• Fitness Operator, metrics to measure scheduler performance for each
chromosome in the problem, and calculate the values for each chromosome.
• Selection Operator, select two chromosomes with the highest quality values
from the population, that couple to produce two offspring.
• Crossover Operator, exchanges two subparts of the selected chromosomes,
the position of the subparts selQeuceterydOrapntidmoimzaltyio.n by Genetic Algorithms 127
• Mutation Operator, randomly changes the allele value in some location.</p>
      <p>R
E
G
E
N
E
R
A
T
I
O
N</p>
      <p>Start
Generate Initial Population</p>
      <p>Encoding generated population
Evaluate Fitness Function Ft(x)</p>
      <p>Meets
Optimization</p>
      <p>Criteria?
Selection (select parents)</p>
      <p>Crossover (married parents)
Mutation (mutate individual)</p>
      <p>Yes</p>
      <p>Best
individuals</p>
      <p>
        Stop
Fig – 1: Flowchart for Genetic Algorithm
This section will present the implementation of information retrieval using
genetic algorithms (for SQL we can see [
        <xref ref-type="bibr" rid="ref11 ref2 ref6 ref8">11, 8, 6, 2</xref>
        ]). The GA is generally used to
solve optimization problems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. GA starts on an initial population with fixed
size of chromosomes ”P-chromosomes”. Each individual are coded according to
chromosome length, where genes are allocated in each position in a chromosome
with different data types, and each gene values called allele. In information
retrieval, query for relevant documents are representing for each individual or
chromosome, and each document described by set of terms. The description di for
document Di, where i = 1 . . . l, the set of terms for Di are Tj , where j = 1 . . . n,
thus di = (w1i , w2i , . . . , wni ). The value for each term will be 1 if this term exists
in the document or 0 if not (Note: about another weights for terms was mention
in paper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), this indicate that the indexing function that is maps a given index
term t and a given document d is
      </p>
      <p>
        F : D × T → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
      </p>
      <p>
        Defining a query will be combination from set of terms and set of Boolean
operators and, or, xor, not. The query set Q defined as set of queries for documents,
define the query processing mechanism by which documents can be evaluated in
terms of their relevance to a given query [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Ih this work, we develop genetic program for implementing GA with variable
length of chromosomes and mixture symbolic of information, like real values and
Boolean queries values.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Chromosome Encoding</title>
      <p>In order to really understand the principle of chromosome encoding, consider a
Boolean query with a series of terms w1, w2, . . . , wn, and set of Boolean
operators from and, or, Xor , and N ot. Examples of queries in infix form are:
and
and
(w2 or w6) and(w9 and w3)
(w3 and w4) xor((w5 and w6) or w8)
and(or w2w6)(and w9w3)
xor(and w3w4)(or(and w5w6)w8)</p>
      <p>In the previous queries ordinary Boolean operators are used, and they are
in infix form, they will be encoded to be chromosomes for genetic programming
but in prefix form such as</p>
      <p>and also they will be represented in a tree form as the chromosomes shown
in Figure 2.</p>
      <p>and also represented a queries in tree from in a chromosomes such as</p>
      <p>and</p>
      <p>AND
OR</p>
      <p>AND
W2</p>
      <p>W6</p>
      <p>W9</p>
      <p>W3</p>
      <p>AND
W3</p>
      <p>W4</p>
      <p>AND</p>
      <p>W8
XOR
W5</p>
      <p>OR
W6</p>
      <sec id="sec-3-1">
        <title>Evaluation of the information retrieval system, measured by effectiveness, two</title>
        <p>statistics are used precision and recall, maximize precision subject to a constraint on
the minimal recall accepted. Recall and precision are often perceived as being
inversely related, i.e., complementary and competitive. [D. H. KRAFT, F. E.PETRY,</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. P. BUCKLES, AND T. SADASIVAN. ﺦﻳرﺎﺘﻟا GENETIC ALGORITHMS FOR</title>
      </sec>
      <sec id="sec-3-3">
        <title>QUERY OPTIMIZATION IN INFORMATION RETRIEVAL: RELEVANCE</title>
        <p>
          and also represented a queries in tree from in a chromosomes such as
and
Evaluation of the inform5a.tEivoanluarteiotnriaenvdaFlitnseyssstFeumnctiiosndone by measuring its
effectiveness. This is best measured by two statistics precision and recall, maximizing
precision is subject toEvaaluactioonnosftthreaiinnfotrmoatinon trehtrieevamlsyinsteimm, maelasrureecdabylleffaeccticveenpestse,tdwo. Recall and
precision are oftenstatpisteicrscaereiuvseeddpraecsisiboneainndgrecianllv,mearxsimeilzye prreecliasiotnesdub,jeic.tet.o,accoonmstrapinlteomnentary and
competitive [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Ftoher mainnimyal recall acsceeptted. Rdecoalcluanmd eprnectissiocnaalrleeodfteCnopelrlceeicvetdioans boeifngDocuments,
given of
inversely related, i.e., complementary and competitive. [D. H. KRAFT, F. E.PETRY,
there is a subset Bo. fP. BUCKLES, AND T. SADASIVAN. ﺦﻳرﺎﺘﻟا GENETIC ALGORITHMScFaOlRled Relevant
documents that are relevant to such query
documents, and aQsUuERbYseOtPoTIfMdIZoAcTuIOmN eInNtsINtFhOaRtMAaTrIeONretRrEiTeRvIEeVdALc:alRlEeLdEVRAeNtCrEieved
Documents. DemonstraFEtEioDnBAfCoKr. نpﺎﻜrﻤﻟeا].ciDseimoonnstarantiodn froercpraeclilsiaonraendshrecoawllanre isnhowFniign ufigr-e2 w3ithwith respect
to all documents rcesopelcltetcotalel ddo.cuWmenhtse.rWeheprerperecciissiioonnandarencadll arreecdeafilnledaars:e defined as:
Recall = RelevantRetrie=ved
        </p>
        <p>Relevant
=</p>
        <p>P recision = RelevantRetrieved</p>
        <p>Retrieved</p>
        <p>Collection of Documents
Relevant Documents</p>
        <p>Recall</p>
        <p>Retrieved Documents</p>
        <p>Precision</p>
        <p>Relevant Retrieved Documents</p>
        <p>Fig-2: Retrieved and relevant documents</p>
        <p>Fitness function for our work will be considered as functions for precision
and recall. One function will trade for the other function because both precision
and recall are inversely related. And they will deal with the ranked documents.
The shortcoming of using recall and precision as fitness functions is that if no
relevant documents are retrieved by a chromosome, then its fitness is zero. This
will lead to loss of all genes for this chromosome.</p>
        <p>
          Computing recall and precision values for each query in our genetic
programming as illustrated in equations 1 and 2, the first fitness function RecallF itnessE1
measures recall (equ-1), and the second fitness function P recisionF itnessE2
measures Precision (equ-2). Where rd is the relevance of document d (1 for
relevant and 0 for nonrelevant), fd is the retrieved document d (1 for retrieval and 0
for nonretrieval), and α and β are arbitrary weights. Where α and β are added
specially to precision fitness function [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Where α and β are added specially to precision fitness function. [D. H. KRAFT, F.</p>
      </sec>
      <sec id="sec-3-4">
        <title>E.PETRY, B. P. BUCKLES, AND</title>
      </sec>
      <sec id="sec-3-5">
        <title>T. SADASIVAN. ﺦﻳرﺎﺘﻟا GENETIC</title>
        <p>ALGORITHMS FOR QUERY OPTIMIZATION IN INFORMATION RETRIEVAL:</p>
      </sec>
      <sec id="sec-3-6">
        <title>RELEVANCE FEEDBACK. نﺎﻜﻤﻟا].</title>
        <p>RecallFitnessE1 = ∑ [rd ∗ fd ] ∑ [rd ]</p>
        <p>d d
130</p>
        <p>Suhail S. J. Owais, Pavel Kr¨omer, and V´aclav Sn´aˇsel
PrecisionFitnessE2 = α ∑d [rd ∗ fd ] ∑d [rd ]+ β ∑d [rd ∗ fd ] ∑d [ fd ]
6. Selection operators</p>
        <p>ReallF itnessE1 =</p>
        <p>Pd[rd × fd]</p>
        <p>Pd[rd]
P recisionF itnessE2 =</p>
        <p>P P
To process genetic algorithms operadto[rrsd]will be done idn[feda]ch generation on the
α Pd[rd × fd]</p>
        <p>β Pd[rd × fd]
+
best two chromosomes.
5.1 Selection operators
chromosomes that depend on the highest fitness values for precision measure or on
Processing genetic algorithm operators will be done in each generation over the
recall measure will be selected. These two chromosomes will be called parent1 and
best two chromosomes. From the population of chromosomes, the best two
chroparent2. These two parents will be used to produce two new offsprings.
mosomes depending on the highest fitness values for precision or recall measures
will be selected. These two chromosomes will be called parent1 and parent2.
These two p7a.reCntrsoswsilol vbeeruoserdRteocpormodbuinceattiwoonnoepweorafftsoprrsings.</p>
      </sec>
      <sec id="sec-3-7">
        <title>From the population of chromosomes, the best two see fig-3.</title>
        <p>spring1 and offspring2.
5.2</p>
        <p>Crossover or Recombination operators
In generating two new offsprings from the existing population, offsprings must
have some inheritance from the best chromosomes in the population; crossover will
have some inheritance from the two parents. Crossover will do that by
exchang</p>
      </sec>
      <sec id="sec-3-8">
        <title>Generating two new offsprings form the existing population. Offsprings must</title>
        <p>idnog sthuabttrbeye cfrhoamngipnagresnutb1trweeithfrosmubtpraereenfrt1omwiptharseunbt2tr.ePeofsriotmionpsafroernts2u.btProeesi1tiaonnsd for
ssuubbttrreeee21 awnidll fbore ssuebletrcetee2dwrailnldboemselyle,catenddratnhdeopmolsyi,tiaonnd mthuesptobsietioBnomolueastnbleoBgiocoallean
olpogericaatloropneordateosr isnucthheastr“eAe;NsDu,cOhRa,sXaOndR,,oarn,dxoNrO,aTn”d. not. This will produce
offSingle point crossover was used in our genetic programming. Index identifier</p>
      </sec>
      <sec id="sec-3-9">
        <title>Single point crossover used in our genetic programming. Index identifier was</title>
        <p>was assigned for each term and each Boolean operator in a prefix form for each
assigned for each term and each Boolean operator in a postfix form for each query,
query that was encoded in a tree, see Figure 4.</p>
        <p>AND</p>
        <p>0
OR
1</p>
        <p>AND</p>
        <p>4
W2
2</p>
        <p>W6
3</p>
        <p>W9
5</p>
        <p>W3
6</p>
        <p>AND</p>
        <p>1
W3
2</p>
        <p>W4
3</p>
        <p>XOR
0
W5
6</p>
        <p>AND
5</p>
        <p>W8
8
OR
4
W6
7
number for parent2 must not exceed the number of nodes of parent2.</p>
        <sec id="sec-3-9-1">
          <title>Fig-3: Trees in Postfix form with indexing</title>
        </sec>
      </sec>
      <sec id="sec-3-10">
        <title>For parent1, the selecQteudesruybtOrepet1i mrainzdaotmionnubmybGeremneutsitcnAotlgeoxrcietehdmthse num1b3e1r</title>
        <p>of nodes for tree1. And the must be done for selection subtree2 random number for
parent2.</p>
        <p>For example, if we have two random numbers 1 and 4 for subtree1 and
subtree2 respectivFeolry,exaanmdplwe,eifimweplcehmoseenrtansdinogmlenupmobinerts c1roanssdo4vefrorpsruobctreeses1 oanndpsaurbetrnete12
and parerenstp2ecftoivreltyr,eaensdswheo wimnplienmFenigtusirnegl3e,ptohinet ncreowssogveenreprraotcedss offnsppareintg1sasnhdopwarnenitn2
see Figufroer t5r,eeasftsehrowenxcinhafingg-3e, sthuebnterweegse.nerated offsprings shown in fig-4, after exchange
sub trees.</p>
        <p>OR
1
W6
4
W5
3</p>
        <p>AND
2</p>
        <p>W8
5</p>
        <p>AND
0</p>
        <p>AND</p>
        <p>6
W9
7</p>
        <p>W3
8</p>
        <p>XOR</p>
        <p>0
AND</p>
        <p>1
W3
2</p>
        <p>W4
3</p>
        <p>OR
4
W2
5</p>
        <p>W6
6</p>
        <sec id="sec-3-10-1">
          <title>Fig-4: New offsprings after single point crossover.</title>
          <p>Fig. 5. New offsprings after single point crossover.
5.3</p>
          <p>8. Mutation operators</p>
          <p>Mutation operators
Mutation, ranMdoumtatipoenr,truanrdboamtiopnertiunrbtahtieonchinr othme ocshorommeosroempererseepnretsaetnitoanti,onis, insenceecsessasarryy
to assurteo atshsautretthheat cthuerrceunrrtengtegneenreartaitioonn iiss ccoonnnnecetcetdetdo tthoe tehnteireensetairrechssepaarcceh, asnpdaictei,s
and it isnenceecsesasrsyartoy itnotroindutrcoednuewcegneneewticgmenaetetriical minatotearipaolpiunlattoioan tphoatphualsatstiaobnilitzheadtlehvaesl.
stabilize[dD.leHve.lK[7R]A.FInT,ouF.r Ege.PnEeTtiRcYp, roBg.rPa.mBmUiCngK,LmESu,taAtNioDn oTp.eSrAatDoAr SmIVaAyNc.haﺦnﻳرgﺎﺘeﻟا
one BooGleEaNnElToIgCicaAlLoGpOerRaItToHrMbSy oFtOhRerQoUnEe.RTYhOatP TisI MfoIrZAmTuItOaNtioInN wINillFObeR MfoArTaInOdN,
or and XREorTRBIEoVolAeaLn: RopEeLrEaVtoArNs.CE FEEDBACK. نﺎﻜﻤﻟا]. In our genetic programming,</p>
        </sec>
        <sec id="sec-3-10-2">
          <title>For emaucthatinoenwopoeffrastporrimnagy, cthhaengseloenceteBdooralenadnolomgicnaulmopbeerartoisr bleysosthtehraonneo.nTeh.aItfisthfoer</title>
          <p>selectedmruatnatdioonmwinllubmebfoerrAisN Dle,sOsRt,haannd XmOuRtaBtiooonleavnaloupe,raatonros.ther selected random
number will be drown and checking if it is on Boolean operator in the offspring,
then exchangeFworilelapcrhoocfefsspsroinng,itse,l(eic.ter.aannddomwnilulmbbeerolresosrthxaonr)o.nEe,xifatmhepsleeleisctsehdorwanndoimn
Figure 5n,uwmhbeerrelesms uthtaantimonutawtiaosn nvaoltuep, raoncoethsesrosneleocfftesdprrainndgo1mbnuutmdboenrewiollnboeffdsrpowrinngan2d,
where racnhdecokminlgyiiff iwties soenleBcoto1letaondoepneoratteorfoinr tthheeosfufsbptrrinege,pthoesniteioxnchoanngoeffwspillripnrgo2ce,sasnodn
from offistp,(rii.ne.gA2NwDe wseilel btehaOtRitorisXaORB)o.oElexaanmplloegwicialll boepsehroawtnorinafnigd-,5,awndherreanmduotamtiloyn
it was chwaansgneodt ptroocoers,s tfhoer onfefwsprtinwgo1 obffutspdroinnegsfoarroeffsshproiwngn2.inRFanigduomre i5f.we select 1 to
denote for the subtree position for offspring2, and from offspring2 we get that the</p>
        </sec>
        <sec id="sec-3-10-3">
          <title>Boolean logical operator is AND, and by random it was changed to OR. The new two</title>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>5.4 UopffdspartiinngsgarPeoshpouwlnatiniofing-5.</title>
        <p>Fitness values will be computed for the two generated new offsprings, and after
that the best chromosomes will survive for the next generation. Offsprings will be
replaced by the worst chromosomes in the population, for this the new population
size will not be changed, replacing two worst chromosomes by two offsprings if the
new offsprings fitness values exceed any of the chromosomes in the population.</p>
        <p>The new population was ready to start next generation. And this will be
repeated until number of generations or until we get the best solution for our
problem.
5.5</p>
        <p>Experiments
We developed our genetic program over a testing environment to see the
influence of genetic algorithms on the information retrieval. The environment had
a population from set of Boolean queries. For our program an input data used
Boolean model of a collection of documents and set of Boolean queries as an
initial population, and for execution the genetic program we used:
– Different Collections of documents with variant number of words and
documents.
– Two sets of queries that represent as tree prefix form used as two different
initial populations with fixed size in number of chromosomes (8 individuals).
– The results will be for a one requesting user query
– An initial values was fixed in all experiments are
• Mutation value is 0.2
• α is 0.25
• β is 1.0
– Fixed number of generations are 50 generations
5.6</p>
        <p>Limitations of current version
– At this time genetic operators are applied only for Boolean logical operators
and, or and xor. This causes following: if queries in initial population do
not include all the terms we have in the user query, they cannot come into
existence.
– In our implementation, precision fitness value can be greater than 1.0, where
the maximal value of precision is 1.25 that is coming from (α and β), thus
we cannot interpret it as probability.
5.7</p>
        <p>Experiment Tests
Our tests was done using user query</p>
        <p>w8 or w2,
and we used two different initial populations Q1 and Q2; and they are shown in
table 1, and table 2 respectively.</p>
        <p>We mention that in initial population Q1 there is in one query contain a sub
expression
w8 and w2,
w8 and w2, w8 or w2, w8 xor w2.
and in initial population Q2 there are in three different queries contains three
different sub expressions</p>
        <p>Our testing of execution for genetic programming was done independently
from other execution results. Three cases were studied. The results of our tests
with different collections (different number of documents and different number
of words/terms). The three cases descriptions are:
– Test case 1: collection of 10 documents, 30 words in collection, maximal
number of different words in a document is 10. Results are shown in table 3.
– Test case 2: collection of 200 documents, 50 words in collection, maximal
number of different words in a document is 50. Results are shown in table 4.
– Test case 3: collection of 5000 documents, 2000 words in collection, maximal
number of different words in a document is 500. Results are shown in table 5.
The results of this study suggests that the final population composed of
individuals having the same strength (quality) will have the same precision and
recall values. The best individual result was randomly chosen as best. We
conclude that the quality of initial population was important to have best result of
genetic programming process, and the less quality of initial population caused
worse results. This could be seen when we chose parents based on precision, and
therefore recall was very small for the large collection.</p>
        <p>We concluded that in order to get good results, we choose parents depending
on the recall fitness values than the precision fitness values, but that will increase
the number of Boolean logical operators for the final best results.</p>
        <p>
          For the future work we suggest to use less number of prefix tree by identifying
the Boolean logical operators only without identify index for terms. For
selecting the best individual with less number of Boolean operators and less number
of terms instead of random selection if we got the final population with same
strength. Modifying our system to work with different Boolean operators (of,
adj, . . . see in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]), and extend this for fuzzy logic.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ribeiro-Neto</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Modern Information Retrieval</article-title>
          . Addison Wesley, New York,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Freytag</surname>
          </string-name>
          , Johann Christoph:
          <article-title>A Rule-Based View of Query Optimization</article-title>
          .
          <source>Proceedings of ACM-SIGMOD</source>
          ,
          <year>1987</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goldberg</surname>
          </string-name>
          , David E.:
          <article-title>Genetic Algorithms in Search, Optimization and Machine Learning</article-title>
          . Reading, Massachusetts: Addison-Wesley,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Korfhage Robert R.:
          <source>Information Storage and Retrieval</source>
          . John Wiley &amp; Sons, Inc.
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Melanie</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An Introduction to Genetic Algorithms. A Bradford Book The MIT Press</article-title>
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , Won:
          <article-title>On Optimizing an SQL-like Nested Query</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          <volume>7</volume>
          ,
          <issue>3</issue>
          (
          <year>September 1982</year>
          ), pp.
          <fpage>443</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kraft</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordogna</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pasi</surname>
          </string-name>
          , G.:
          <article-title>Fuzzy Set Techniques in Information Retrieval</article-title>
          , in Bezdek, J.C.,
          <string-name>
            <surname>Didier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Prade</surname>
          </string-name>
          , H. (eds.),
          <source>Fuzzy Sets in Approximate Reasoning and Information Systems</source>
          , vol.
          <volume>3</volume>
          , The Handbook of Fuzzy Sets Series, Norwell, MA: Kluwer Academic Publishers,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McGoveran</surname>
          </string-name>
          , David: ”Evaluating Optimizers.”
          <article-title>Database Programming and Design</article-title>
          .
          <source>January</source>
          <year>1990</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Terms-Weighting approach in automatic text retrieval</article-title>
          .
          <source>Information Processing and management, 1988</source>
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Suhail</surname>
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Owais</surname>
          </string-name>
          .: Timetabling of Lectures in the Information Technology College at Al al-Bayt University Using Genetic Algorithms.
          <source>Master thesis</source>
          , Al al-Bayt
          <source>University</source>
          <year>2003</year>
          , Jordan. (in Arabic).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Yao</surname>
          </string-name>
          , S. Bing: ”
          <source>Optimization of Query Algorithms.” ACM Transactions on Database Systems</source>
          <volume>4</volume>
          ,
          <issue>2</issue>
          (
          <year>June 1979</year>
          ), pp.
          <fpage>133</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>