<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Framework for Evolutionary Modelling in Text Mining</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Michael</forename><forename type="middle">Y</forename><surname>Bogatyrev</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Tula State University Tula</orgName>
								<address>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexey</forename><forename type="middle">P</forename><surname>Terekhov</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Tula State University</orgName>
								<address>
									<settlement>Tula</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Framework for Evolutionary Modelling in Text Mining</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">6FEB6F7E59378E4645917DE56F8389E5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Special framework for Evolutionary Modelling is presented. It is oriented on experimental investigations of schemes and properties of genetic algorithms. With the help of this framework Evolutionary Approach to Conceptual Graphs Clustering is investigated. Some experimental results of clustering scientific papers abstracts are presented.</p><p>evolutionary approach is applied to conceptual graphs clustering problem. Section 4 is devoted to the Framework for Evolutionary Modelling as a part of EVOLIB digital library research project. Illustrative examples of conceptual graphs clustering are shown in this section. Finally, conclusion and further work plan constitute the section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Evolutionary Computations in Text Mining</head><p>Consider the principle of Evolutionary Computation in general to outline some its features feasible to possible implementations in Text Mining.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The problem of extracting natural language semantics is very complex and could not be solved by implementing a single approach. In the field of Text Mining solution of this problem is proposed by constructing semantic models of a text and applying these models to detect special patterns such as clusters, associations, deviations and similarities in text collections. The patterns mentioned, being abstract mathematical objects or numerical values, do not solve initial problem of extracting language semantics and often need to be semantically interpreted.</p><p>In this paper Evolutionary Modelling <ref type="bibr" target="#b1">[2]</ref> as a possible way of such semantic interpretation is presented. It is based on the paradigm of Evolutionary Computation <ref type="bibr" target="#b23">[24]</ref> which in turn engenders the area of Genetic Algorithms <ref type="bibr" target="#b13">[14]</ref>, a powerful and domain independent search and optimization technique. As domain independent tool genetic algorithms have been used to solve various Text Mining problems: information and documents retrieval <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b11">[12]</ref>, text segmentation <ref type="bibr" target="#b14">[15]</ref>, extracting key phrases from text <ref type="bibr" target="#b22">[23]</ref>, etc. Having a semantic text model and a Text Mining problem, Evolutionary Modelling is the way to find the best solution of a problem by evolving parameters of a model. The mechanism of evolution of a model is specified by appropriate genetic algorithm. Genetic algorithm produces populations of various solutions. Analyzing the evolution of such populations one can construct possible semantic interpretations of them.</p><p>The method which is proposed here is mainly experimental and is supported by special framework for evolutionary modelling. This framework is a part of EVOLIB digital library research project <ref type="bibr" target="#b2">[3]</ref>. Among user functions of EVOLIB there is the function of fact extraction from abstracts of scientific papers <ref type="bibr" target="#b2">[3]</ref>. To realize this function we use Conceptual Graphs as semantic models of abstract sentences. Then Evolutionary Modelling is applied to conceptual graphs clustering. Various similarity measures are investigated with the help of the framework. As result abstracts of declarative and narrative types were detected in the library.</p><p>The paper is organized as follows. Section 2 contains quite general description of the principle of Evolutionary Computation to show its fundamental nature. Some peculiarities of application of Evolutionary computation in Text Mining are also presented. In section 3</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Principle of Evolutionary Computation</head><p>Let X is a set of solutions of a Text Mining problem. Every solution x ∈ X can be characterized by a quality measure named as fitness function or fitness. Actually very often this measure is not a function in mathematical sense because x may be not a numerical value, being for example configuration of clusters. So in general instead of denoting y = f (x) we have to use a mapping f : X → Y where Y⊆ ⊆ ⊆ ⊆ R + is the subset of a set of positive real numbers. Nevertheless, following the tradition we will use " f ( )" notation and call it as fitness function.</p><p>Let solutions of a Text Mining problem depend on a set of parameters P of a model which is used in a problem. Most of Text Mining problems which have been solved by using genetic algorithms can be formulated as the following optimization problem: it is required to find optimal values of parameters p * which deliver maximum fitness value y * ∈ Y, so the following is true:</p><formula xml:id="formula_0">p * = argmax f (x),</formula><p>(1) p * ∈ P Evolutionary approach to solving this problem consists in the following. 2.1. Building encoding scheme. Encoding scheme is the mapping φ: P → S where set S contains objects which encode parameters from P. Most of genetic algorithms use binary encoding and every value of p ∈ P is represented as binary string. Then these strings are randomly manipulated by genetic algorithm and all variety of existing manipulation methods (genetic operators) <ref type="bibr" target="#b10">[11]</ref> can be treated as permutations of strings. Thereupon one can conclude that genetic algorithms only constitute a sort of random search methods. Actually encoding is very important and represents the essence of evolutionary approach. There is an atomic principle of encoding which claims that encoding scheme has to be such that it generates minimal elements which influence on the values of elements of Y. As in biology, heredity theory claims that gene (strictly gene combinations) is the minimal element which really determines individual characteristics, as here, in Evolutionary computation, atomic encoding principle plays the same role. Therefore genetic algorithms were justly named as genetic. Encoding scheme is not necessarily binary (as it is not binary in Nature): every string position contains a symbol (gene) from encoding alphabet, and there are variants of alphabets applied in encoding schemata <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b6">[7]</ref>, <ref type="bibr" target="#b22">[23]</ref>. But necessarily there exists an inverse mapping φ -1 : S → P, so for every s ∈ S there exists p ∈ P.</p><p>2.2. Evolutionary algorithm. For given encoding scheme the following algorithm solves the problem <ref type="bibr" target="#b0">(1)</ref>.</p><p>A. Randomly generate an initial set (population) S 0 of objects from S. B. Start evolution of the populations by applying a set of operators A to population S 0 and further iteratively so that for every S k+1 = A (S k ) exists at least one</p><formula xml:id="formula_1">f [φ -1 (s k+1 )] ≥ f [φ -1 (s k )],<label>(2)</label></formula><p>where s k ∈ S k and s k+1 ∈ S k+1 .</p><p>C. Stop evolution of populations when p * is found in a population. If the set of operators A consists of genetic operators of selection, mutation and recombination (crossover) then evolutionary algorithm is named as genetic algorithm.</p><p>Selection works so that condition (2) is supported by the following "biological" principle: good parents produce good offspring (that is not true in Nature). So the higher fitness chromosomes have more opportunity to be selected than the lower ones and good solution is always alive in the next generation.</p><p>Crossover is the genetic operator that mixes two chromosomes together to form a new offspring. It does mixing by replacing fragments of chromosome's code divided in certain one or several randomly selected points.</p><p>Mutation involves modification of the gene values by randomly selecting new value from the alphabet at random point in the strings of genes.</p><p>Being realized, the algorithm (A. -C.) provides fast and quite exact solution of the problem <ref type="bibr" target="#b0">(1)</ref>.</p><p>Quite exact means that genetic algorithm stops in a neighbourhood of global extreme of fitness function f <ref type="bibr" target="#b10">[11]</ref>. The size of a neighbourhood around extreme depends on the fitness function and parameters of genetic operators. When genetic algorithm works too fast it may stop at local extreme. This feature is traditionally considered as the lack of the algorithm but it may be useful for Text Mining since local extreme of quality measure may be semantically "better" than global extreme. In our experiments we have observed just that situation <ref type="bibr" target="#b3">[4]</ref>.</p><p>Operating speed of genetic algorithms could not be high because they have to manage not one but a whole set of possible solutions and evaluate fitness function N times on every step of evolution, where N is the size of population. Nevertheless, they are fast as compared to other algorithms for solving the problem (1) due to the following properties of genetic algorithms:</p><p>-their feature of implicit parallelism <ref type="bibr" target="#b10">[11]</ref> provides a quasi parallel way of computations, and by using properly constructed genetic operators genetic algorithms have an ability to span simultaneously a large subsets of search space; -a population of chromosomes evolves successfully to an extreme because specific "good" gene combinations known as building blocks <ref type="bibr" target="#b10">[11]</ref> appear and breed other, "better" gene combinations which form final solution. The last property of appearing building blocks during evolution is the reason for applying strings as working objects in evolutionary algorithms because building blocks act on strings. It is also needed to control building blocks in evolutionary algorithms and this is realized in our framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Applications in Text Mining</head><p>Genetic algorithms have been applied to solve many problems in Information Retrieval and Text Mining. Following the review in <ref type="bibr" target="#b0">[1]</ref> we briefly outline distinctive directions of their implementations and comment them to illustrate properties of evolutionary algorithm described in previous subsection.</p><p>Documents Indexing and Retrieval <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b7">[8]</ref>. The main problem in this area is to adapt documents descriptions in the library with the requirements of queries. That adaptation allows constructing special models (indexes) of stored documents which facilitate search for documents being relevant to the query. Encoding scheme here represents documents indexes as chromosomes with binary and non binary alphabets. Using those indexes it is easy to construct fitness function which is based on calculating the similarity between the current document and each of the queries. According to our model of evolutionary algorithm from previous subsection here X is a set of documents, encoding scheme is the mapping ψ: X → S and fitness function is calculated as f (s, q), where s is a chromosome and q is a query. For some practical reasons mutations are not allowed on chromosomes. So the evolutionary algorithms in this area are realized as not classical genetic algorithms.</p><p>Learning of Matching Functions and Queries. <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b19">[20]</ref>. Another model of stored documents which is used in Text Mining applications is vector space model. The content of library documents is described by n terms which constitute n-dimensional vector space. Each document is a point in this space which coordinates are weights of corresponding terms occurred in the document. A query is also treated in the same way as a vector constructed from the terms and weights distinguished from user request. Document retrieval is based on the measurement of the similarity between the query and the documents. Here the learning also means adapting objects -matching functions and queries -to provide for certain fitness function to have an extreme. Evolutionary algorithms have been applied to different matching functions -from traditional form of linear combination of existing similarity functions to tree form. In the last case Genetic Programming, the branch of Evolutionary computation where evolution is performed by operations on graphs, is applied. In Genetic Programming encoding scheme is graph and application of genetic operators on that scheme has some peculiarities. Here mutations are not allowed or restricted by specificity of applied alphabet, crossover on graphs is realized as exchange between two graphs by their sub graphs and selection needs the measure of quality (semantics) of graphs.</p><p>Clustering of Documents and Terms <ref type="bibr" target="#b6">[7]</ref>, <ref type="bibr" target="#b11">[12]</ref>, <ref type="bibr" target="#b18">[19]</ref>, <ref type="bibr" target="#b20">[21]</ref>. In spite of the fact that clustering problem is well investigated and many clustering algorithms have been proposed, its new solutions are still appearing, also with using evolutionary approach. There are two crucial parameters of clustering problem: a measure of similarity of clustering objects and number of clusters -is it given or not before clustering. In Text Mining, clusters of documents have to be additionally interpreted, including semantic interpretation. Evolutionary algorithms have advantage over traditional clustering methods when:</p><p>-measure of similarity of clustering objects is not traditional (Euclidian norm), -number of clusters is not given and -number of clusters is great. All these conditions present in Text Mining problems. For example semantic measure of texts similarity is not traditional, number of clusters is not known in many Text Mining problems and number of clusters is great when they for long texts.</p><p>Irrespectively of the nature of Text Mining problem in all three marked above applications areas the mapping f : X → Y could not be represented as simple analytical function. It is often represented as multimodal function which has finite breaks <ref type="bibr" target="#b3">[4]</ref>. Evolutionary and genetic algorithms are good just for optimization problems with multimodal fitness functions which have finite breaks <ref type="bibr" target="#b10">[11]</ref>. That fact is the main motivation for application of evolutionary approach in Text Mining.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Evolutionary Approach to Conceptual Graphs Clustering</head><p>Consider Conceptual Graphs Clustering problem as an example of application Evolutionary Approach in Text Mining. The formulation of clustering problem actual for Text Mining is known for a long time <ref type="bibr" target="#b15">[16]</ref>: for a given set of objects and their associate descriptions it is needed to find clustering that groups these objects into concepts, then find an intentional definition for each concept, and a hierarchical organization for these concepts. That intentional definition may be realized as semantic interpretation of clusters. For Conceptual Graphs <ref type="bibr" target="#b21">[22]</ref>, the clustering problem urgency depends on how measure of similarity and number of clusters are specified in the problem.</p><p>In all known approaches to conceptual clustering <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b16">[17]</ref>, <ref type="bibr" target="#b18">[19]</ref> classical clustering algorithms of k -means and hierarchical clustering have been applied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Measures of similarity of conceptual graphs</head><p>The Dice coefficients known as standard similarity measure for text documents serve as a base for creating specific measures in concrete problems. This is done in <ref type="bibr" target="#b17">[18]</ref> for comparison of conceptual graphs and in <ref type="bibr" target="#b18">[19]</ref> for conceptual graphs clustering problem. We also used Dice coefficients and their modifications.</p><p>For the two given graphs 1 G and 2 G the similarity measure depends on two valuesconceptual similarity c s and relative similarity r s .</p><p>1 2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">( ) ( ) ( )</head><formula xml:id="formula_2">c c n G s n G n G = +<label>(3)</label></formula><p>where</p><formula xml:id="formula_3">2 1 G G G c ∩ = , ) (G n</formula><p>is the number of concepts -conceptual nodes of graph G .</p><formula xml:id="formula_4">1 2 2 ( ) ( ) ( ) c c c r G G m G s m G m G = +<label>(4)</label></formula><p>where</p><formula xml:id="formula_5">) ( c G m is the number of relations -relative nodes of conceptual graph c G , ) (G m c G</formula><p>is the number of relations -relative nodes of conceptual graph G , at least one of which belongs to graph c G . The measures (3), (4) are imperfect. Using these similarities we can get graphs that have many common concepts and relations but different meanings. This property of standard similarity measures is well known. We get incorrect similarity value due to the presence of identical words in sentences. These words don't carry useful information but affect the calculated similarity value. These words (called clichés, stock phrases and set expressions) exist in all languages. It is necessary to use some kind of filtering to remove this "noise".</p><p>If some lexical restrictions are introduced-for example only texts of scientific articles are analyzed -it has become possible to create a set of concepts of general usage in the given research area. For example they are "article", "this", "document", "observe", "describe", etc. These concepts can be safely excluded when analyzing similarity between conceptual graphs.</p><p>Taking into account in general usage concepts the value <ref type="formula" target="#formula_2">3</ref>) is assumed to be the number of concepts that are not in general usage:</p><formula xml:id="formula_6">) (G n in (</formula><formula xml:id="formula_7">i a a a G n + + + = ... ) ( 2 1 , 1,..., i N = , } 1 | 0 { ∈ i a ,</formula><p>where N is the number of all concepts in graph G, i a (general validity factor) possesses the value 0 or 1 subject to whether i-th concept is in general usage or not.</p><p>When calculating similarity between conceptual graphs it is also necessary to take into account sizes of the graphs compared. It is also possible to modify the formula for similarity (3), ( <ref type="formula" target="#formula_4">4</ref>) in order to take into account the sizes of the graphs:</p><formula xml:id="formula_8">1 2 2 ( ) , ( ) ( ) c c n G l s n G n G = +<label>(5)</label></formula><p>where ( )</p><formula xml:id="formula_9">1 1 2 2 2 1 2 1 ( ) , ( ) ( ) ( ) , ( ) ( ) ( ) n G k if n G n G n G l n G k if n G n G n G  ≥   =   &lt;  </formula><p>, and k is a scaling factor.</p><p>In order to calculate relative similarity between conceptual graphs using ( <ref type="formula" target="#formula_4">4</ref>), one has to take into account the number of relations. Taking the relations into account, the value of <ref type="formula" target="#formula_4">4</ref>) is calculated as follows:</p><formula xml:id="formula_10">) (G m c G in formula (</formula><formula xml:id="formula_11">i both G b b b m G m c + + + + = ... ) ( 2 1 , both m m i − = ,..., 1 , } 1 , 0 { ∈ i b ,</formula><p>where m is the number of all the relations of graph G, both m is the number of relations of graph G both nodes of which belong to graph c G , i b (relevance factor) possesses values in the range of 0 and 1 subject to relation type. Both measures -conceptual and relative similarity -were applied in clustering experiments separately.</p><p>Semantic The measure proposed in <ref type="bibr" target="#b24">[25]</ref> is appropriate and well grounded model of semantic measure of similarity between conceptual graphs. The similarity between two concepts is obtained by the distance between them. The distance between two concepts is calculated by their respective positions in the concept hierarchy. This hierarchy can be obtained from WordNet system. In the framework for evolutionary modelling we use WordNet as the source of ontology segments needed to find a closest common parent for two concepts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Encoding scheme</head><p>Every solution of clustering problem is represented as a string of length n called a chromosome. There exist several ways to represent clustering problem solution in the encoding scheme <ref type="bibr" target="#b6">[7]</ref>. Figure <ref type="figure" target="#fig_0">1</ref> illustrates variants of encoding schemes. All of them use long chromosomes which length is equal to the number of objects to be clustered.</p><p>The drawback of encodings (a) -(e) on the Figure <ref type="figure" target="#fig_0">1</ref> is that it is necessary to specify a number of clusters which is usually unknown a priori.</p><p>We propose modified encoding in which a i denotes ordinal number of an object that belongs to the same cluster as i-th object. As a result, the number of clusters depends on the distribution of such links between objects. If object points to object B and the latter points to object C, it means that they are all in the same cluster. This encoding doesn't bind the objects to some particular cluster -it just contains information about links between the objects. Due to that kind of "chain" relations between objects it is enough to add a link between any object from one set and any object from another set in order to join these sets.</p><p>This encoding is atomic for clustering problem. Another important advantage of the proposed encoding is that it provides quite fast work of the algorithm even on long chromosomes due to quasi parallelization of calculations: several genes in the chromosome may point to the same cluster simultaneously, so clusters are formed in a quasi parallel way.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">The EVOLIB Framework</head><p>The EVOLIB is a digital library research project <ref type="bibr" target="#b2">[3]</ref> that includes a subsystem of Evolutionary computation as a solver. The structure of EVOLIB system is shown in Figure <ref type="figure" target="#fig_1">2</ref>. The EVOLIB contains the library of scientific papers as a content of Data subsystem. There are over 2500 items in the library. Every paper has an abstract which is an object of processing by the system instead of paper's text. This is based on assumption that an abstract is short and clear essence of a paper. Abstracts form input data for Models subsystem where conceptual graphs have been created.</p><p>The problem of building conceptual graphs is solved by using existing approaches <ref type="bibr" target="#b5">[6]</ref>, <ref type="bibr" target="#b9">[10]</ref>, <ref type="bibr" target="#b12">[13]</ref>: we apply Semantic Roles Labelling as the main instrument for building relations and also WordNet system for English texts. Since there are many Russian papers in the library some special decisions were made concerning the Russian language. The XML database for storing conceptual graphs and corresponding DBMS are parts of the system.</p><p>The Problems subsystem realizes the software for conceptual graphs clustering. The GenAlgs subsystem is devoted to exploration of genetic algorithms by varying their schemes and parameters of genetic operators. The visualization subsystem plays an important role in the experiments. The EVOLIB system is created as an open framework. That means that new models, new problems and new genetic algorithms can be added to the system without changing its kernel architecture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Evolutionary Modelling in Conceptual Graphs Clustering</head><p>Evolutionary modelling is appropriate instrument for learning abstracts semantics. This is general problem which may be separated on some specific tasks including conceptual graphs clustering.</p><p>As it is mentioned above Evolutionary approach is preferable tool for clustering when the number of clustering objects is great and similarity measure is not usual. The first condition leads to possibly great number of clusters which have to be interpreted for further implementations. The second condition, i.e. measures based on Dice coefficients and semantic measure cause the fitness function to be multimodal. There are several variants of clustering for multimodal fitness function <ref type="bibr" target="#b6">[7]</ref> because genetic algorithm may quickly find local extreme and then stop. Modelling framework allows adjusting parameters of genetic operatorsmutation probability, number of crossover points and type of selection -to control genetic algorithm convergence.</p><p>The next important tool of modeling in EVOLIB is visualization. It is applied to show two types of structures of clusters: clustering dendrograms and maps of clusters and their parameters. The last tool of visualization is actual when semantic measure is applied. Visualization also helps to find out what kind of extreme is achieved when genetic algorithm stops. This is achieved by finding and showing building block elements in chromosomes.</p><p>Consider some illustrative examples of applying evolutionary modelling in conceptual graphs clustering.</p><p>Clustering helps to investigate structures of papers abstracts. It is observed that there are two types of abstracts -abstracts of declarative and narrative types. As a rule, declarative abstracts are short and contain weakly connected sentences. Oppositely, narrative abstracts are quite long and contain sequences of closely connected sentences. Declarative abstracts are common and may contain new facts as new terms and definitions. Narrative abstracts are not usual and may contain new ideas narrated in the abstract. To detect narrative abstracts we tested all three measures of similarity of conceptual graphs. Relative similarity was found as the most appropriate for such detection. Figure <ref type="figure" target="#fig_2">3</ref> illustrates the most meaningful such result. This is result of clustering two abstracts. One of them is long containing 11 sentences (numbered from 0 to 10). Another abstract consists of a single sentence number 11. In Figure <ref type="figure" target="#fig_2">3</ref> one can see "narrativeness" of the first abstract that is expressed as "nesting of senses" on the clusters structure. At the same time graph number 11 constitutes its own cluster.   This is the cluster for the single sentence in second abstract numbered as 7(8-1). The sentence is about ontology so "metaphysics" has appeared as hyperonim for "ontology".</p><p>During experiments of optimization the whole evolution of chromosome populations is stored. This information is further used to reveal building blocks (see Section 2). Building elements are visualized with the help of genotype density plot which is shown in Figure <ref type="figure" target="#fig_5">5</ref>. This allows relating good solutions observed during the process of evolution to properties of the system under optimization.</p><p>Genotype density plot is a function</p><formula xml:id="formula_12">A L P → ×</formula><p>, where A is an alphabet, P is a set of population species and L is a set of positions in a chromosome. The function possesses values from a set of alphabet values which are currently shown in gray on the application plots. In Figure <ref type="figure">6</ref> the number of tints is not enough to map to all 11 alphabet values. Building block elements are shown as chamfered rectangles. These blocks appear several times in a population. They are further used to "breed" parts of final population chromosomes concurrently. Checking the genotype to be unified is carried out not only visually but also with the help of the framework software.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Further Work</head><p>Summarizing the material above, we resolve that Evolutionary Modelling is a perspective experimental method for solving problems in Text Mining which can be treated as problems. It can be also applied as a tool for semantic interpretations of modeling results.</p><p>At the same time the framework for Evolutionary Modelling considered in this paper needs further development. That development is planned in following directions: -applying corpora technologies in Data subsystem; -including Genetic Programming as a tool for working with graph models; -developing online versions of EVOLIB subsystems. We hope that this Evolutionary Modelling framework will be valuable for evolution of the area of Text Mining.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Variants of encoding schemes for conceptual graphs clustering.</figDesc><graphic coords="7,99.24,50.64,441.00,241.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The general structure of EVOLIB system.</figDesc><graphic coords="7,228.48,442.80,181.92,226.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Example of clustering with the help of the relative similarity measure.</figDesc><graphic coords="9,187.80,74.28,263.28,244.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4</head><label>4</label><figDesc>Figure 4 illustrates clustering the same two abstracts when semantic measure is applied. It is shown the list of hyperonims (common parents) for concepts in the cluster.</figDesc><graphic coords="9,132.48,396.12,373.80,218.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example of the map of clusters and their parameters. Notation 7(8-1) means that on the 7 th step of clustering the cluster contains first sentence from 8 th abstract.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Genotype density plot. Chromosome elements (genes) are arranged horizontally; different chromosomes in a population are arranged vertically.</figDesc><graphic coords="10,127.32,387.36,384.12,126.96" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Using Genetic Algorithm to Improve Information Retrieval Systems</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><surname>Radwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bahgat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abdel</forename><surname>Abdel Latef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mgeid</surname></persName>
		</author>
		<author>
			<persName><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Osman</surname></persName>
		</author>
		<author>
			<persName><surname>Sadek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. of World Academy of Science, Engineering and Technology</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Genetic algorithms in evolutionary modelling</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Birchenhall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kastrinos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Metcalfe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Evolutionary Economics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="375" to="393" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Application of Conceptual Graphs in Digital Libraries</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Latov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Stolbovskaya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth Russian National Research Conference -Pereslavl-Zalesskij: Pereslavl University</title>
				<meeting>the Ninth Russian National Research Conference -Pereslavl-Zalesskij: Pereslavl University</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="109" to="115" />
		</imprint>
	</monogr>
	<note>Digital libraries: Advanced Methods and Technologies, Digital Collections. in Russian</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Solving Some Text Mining Problems with Conceptual Graphs</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Y</forename><surname>Bogatyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Tuhtin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Digital libraries: Advanced Methods And Technologies, Digital Collections</title>
				<imprint>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
	<note>Proceedings of the Tenth Russian National Research Conference RCDL&apos;</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Conceptual Clustering of Complex Objects: A Generalization Space based Approach</title>
		<author>
			<persName><forename type="first">I</forename><surname>Bournaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Ganascia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">954</biblScope>
			<biblScope unit="page" from="173" to="187" />
			<date type="published" when="1995">1995</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">CGExtract: Towards Extraction of Conceptual Graphs from Controlled English</title>
		<author>
			<persName><forename type="first">S</forename><surname>Boytcheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dobrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="volume">2120</biblScope>
			<date type="published" when="2001">2001</date>
			<publisher>Springer Verlag</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Clustering With Genetic Algorithms</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Cole</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page">110</biblScope>
		</imprint>
		<respStmt>
			<orgName>University of Western Australia</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Personalization of search engine services for effective retrieval and knowledge management</title>
		<author>
			<persName><forename type="first">W</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Gordon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pathak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. International Conference on Information Systems (ICIS)</title>
				<meeting>International Conference on Information Systems (ICIS)<address><addrLine>Brisbane, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000. 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Discovery of context-specific ranking functions for effective information retrieval using genetic programming</title>
		<author>
			<persName><forename type="first">W</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Gordon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pathak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="523" to="527" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Automatic labeling of semantic roles</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gildea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="245" to="288" />
			<date type="published" when="2002">2002. 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Genetic Algorithms in Search Optimization and Machine Learning</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Goldberg</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Addison-Wesley</publisher>
			<pubPlace>Reading, MA, USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Probabilistic and genetic algorithms for document retrieval</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Gordon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1208" to="1218" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Automatically building conceptual graphs using VerbNet and WordNet</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hensman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dunnion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Symposium on Information and Communication Technologies (ISICT)</title>
				<meeting>the 3rd International Symposium on Information and Communication Technologies (ISICT)<address><addrLine>Las Vegas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004-06-16">2004. June 16-18, 2004</date>
			<biblScope unit="page" from="115" to="120" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Adaptation in Natural and Artificial Systems</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Holland</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1975">1975</date>
			<pubPlace>, MI, USA</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of Michigan Ann Arbor</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">SegGen: a Genetic Algorithm for Linear Text Segmentation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lamprier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Amghar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Levrat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Saubion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twentieth International Joint Conference on Artificial Intelligence</title>
				<meeting>the Twentieth International Joint Conference on Artificial Intelligence<address><addrLine>Menlo Park, California</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2007">2007. 2007</date>
			<biblScope unit="page" from="1647" to="1652" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Learning from observation: Conceptual clustering</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Michalski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Stepp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning: An Artificial Intelligence Approach</title>
				<editor>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Michalski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Carbonell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</editor>
		<meeting><address><addrLine>Palo Alto, CA</addrLine></address></meeting>
		<imprint>
			<publisher>Tioga</publisher>
			<date type="published" when="1983">1983</date>
			<biblScope unit="page" from="331" to="363" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automatic Structuring of Knowledge Bases by Conceptual Clustering</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">W</forename><surname>Mineau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Godin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="824" to="829" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Flexible comparison of conceptual graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>López-López</surname></persName>
		</author>
		<author>
			<persName><surname>Baeza-Yates</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th International Conference and Workshop on Database and Expert Systems Applications</title>
				<meeting>the 12th International Conference and Workshop on Database and Expert Systems Applications</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Text Mining at Detail Level Using Conceptual Graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lopez-Lopez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lecture Notes In Computer Science</title>
		<imprint>
			<biblScope unit="volume">2393</biblScope>
			<biblScope unit="page" from="122" to="136" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Effective information retrieval using genetic algorithms based matching functions adaption</title>
		<author>
			<persName><forename type="first">P</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gordon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 33 rd Hawaii International Conference on Science (HICS)</title>
				<meeting>33 rd Hawaii International Conference on Science (HICS)<address><addrLine>Hawaii, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Generation of equifrequent groups of words using a genetic algorithm</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Willet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Documentation</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="213" to="232" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Conceptual Graphs: Draft Proposed American National Standard</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sowa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Conceptual Structures ICCS-99</title>
		<title level="s">Lecture Notes in Artificial Intelligence</title>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1999">1999. 1999</date>
			<biblScope unit="volume">1640</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Learning to Extract Keyphrases from Text</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<idno>NRC #41622</idno>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>National Research Council, Institute for Information Technology,</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Random heuristic search</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Vose</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Theoretical Computer Science</title>
		<imprint>
			<biblScope unit="volume">229</biblScope>
			<biblScope unit="page" from="103" to="142" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Conceptual Graph Matching for Semantic Search</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10 th International Conference on Conceptual Structures: Integration and Interfaces</title>
				<meeting>the 10 th International Conference on Conceptual Structures: Integration and Interfaces</meeting>
		<imprint>
			<publisher>Springer -Verlag</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="92" to="106" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
