-

1st International Workshop on Combinations of Intelligent Methods and Applications (CIMA 2008)

Tuesday July

Patras

ihatz@ceid.upatras.gr ihatz@ceid.upatras.gr. michailo@ceid.upatras.gr. 0

Greece

0 0 Ioannis Hatzilygeroudis , Constantinos Koutsojannis and Vasile Palade

27 68

Proceedings Copyright © 2008 for the individual papers by the papers’ authors. Copying is permitted for private and academic purposes. Re-publication of material from this volume requires permission by the copyright owners. Using Genetic Programming to Learn Models Containing Temporal Relations from

Spatio-Temporal Data

Andrew Bennett and Derek Magee ………………………………………………………… 7 Combining Intelligent Methods for Learner Modelling in Exploratory Learning

Environments

Mihaela Cocea and George D. Magoulas ……………………………………………….. 13

Belief Propagation in Fuzzy Bayesian Networks

Christopher Fogelberg, Vasile Palade and Phil Assheton ……………………………... 19 Combining Goal Inference and Natural-Language Dialogue for Human-Robot Joint

Action

Mary Ellen Foster, Manuel Giuliani, Thomas Muller, Markus Rickert, Alois Knoll, Wolfram Erlhagen, Estela Bicho, Nzoji Hipolito and Luis Louro ………………………. 25

A Tool for Evolving Artificial Neural Networks

Efstratios F. Georgopoulos, Adam V. Adamopoulos and Spiridon D. Likothanassis .. 31

Intelligently Raising Academic Performance Alerts

Dimitris Kalles, Christos Pierrakeas and Michalis Xenos ………………………………. 37

Recognizing predictive patterns in chaotic maps

Nicos G. Pavlidis, Adam Adamopoulos and Michael N. Vrahatis ……………………... 43 Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning

Jim Prentzas, Ioannis Hatzilygeroudis and Othon Michail …………………….……….. 49 Combinations of Case-Based Reasoning with Other Intelligent Methods (short paper)

Jim Prentzas and Ioannis Hatzilygeroudis ……………………………………………..... 55 Combining Argumentation and Hybrid Evolutionary Systems in a Portfolio

Construction Application

Nikolaos Spanoudakis and Konstantina Pendaraki and Grigorios Beligiannis ………. 59 An Architecture for Multiple Heterogeneous Case-Based Reasoning Employing

Agent Technologies (short paper)

Elena I. Teodorescu and Miltos Petridis ...................................................................... 65

Workshop Organization Chairs-Organizers Ioannis Hatzilygeroudis

University of Patras, Greece

Constantinos Koutsojannis

TEI of Patras, Greece

Vasile Palade

Oxford University, UK

Program Committee

Ajaith Abraham, IITA, South Korea Ao Sio Iong, Oxford University, UK Plamen Agelov, Lancaster University, UK Emilio Corchado, University of Burgos, Spain George Dounias, University of the Aegean, Greece

Artur S. d’Avila Garcez, City University, UK

Melanie Hilario, CUI - University of Geneva, Switzerland Elpida Keravnou-Papailiou, University of Cyprus, Cyprus Rudolf Kruse, University of Magdeburg, Germany George Magoulas, Birkbeck College, Univ. of London, UK Vasilis Megalooikonomou, University of Patras, Greece Toni Moreno, University Rovira i Virgili, Spain Amedeo Napoli, CNRS-INRIA-University of Nancy, France Ciprian-Daniel Neagu, University of Bradford, UK Jim Prentzas, TEI of Lamia, Greece Han Reichgelt, Southern Polytechnic State Univ., GA, USA David Sanchez, University Rovira i Virgili, Spain Douglas Vieira, University of Minas Gerais, Brazil

Contact Chair Ioannis Hatzilygeroudis

Dept. of Computer Engineering & Informatics University of Patras, Greece Email: ihatz@ceid.upatras.gr

Preface

The combination of different intelligent methods is a very active research area in Artificial Intelligence (AI). The aim is to create integrated or hybrid methods that benefit from each of their components. It is generally believed that complex problems can be easier solved with such integrated or hybrid methods.

Some of the existing efforts combine what are called soft computing methods (fuzzy logic, neural networks and genetic algorithms) either among themselves or with more traditional AI methods such as logic and rules. Another stream of efforts integrates casebased reasoning or machine learning with soft-computing or traditional AI methods. Some of the combinations have been quite important and more extensively used, like neurosymbolic methods, neuro-fuzzy methods and methods combining rule-based and casebased reasoning. However, there are other combinations that are still under investigation. In some cases, combinations are based on first principles, whereas in other cases they are created in the context of specific applications.

The Workshop is intended to become a forum for exchanging experience and ideas among researchers and practitioners who are dealing with combining intelligent methods either based on first principles or in the context of specific applications.

There were totally 20 papers submitted to the Workshop. Each paper was reviewed by at least two members of the PC. We finally accepted 12 papers (10 full and 2 short). Revised versions of the accepted papers (based on the comments of the reviewers) are included in these proceedings in alphabetic order (based on first author).

Five of the accepted papers deal with combinations of Genetic Programming or Genetic Algorithms with either non-symbolic methods, like Neural Networks (NNs) and/or Kalman Filters (Georgopoulos etal, Spanoudakis etal), or symbolic ones, like Decision Trees (Kalles etal) and Temporal Logic (Bennett and Magee). Another four papers deal with combinations of Case-Based Reasoning (CBR). One of them presents a short survey of CBR combinations (Prentzas and Hatzilygeroudis) and another one a combination with Agents (Teodorescu and Petridis). The rest two of them present CBR combinations with a Neuro-Fuzzy (Cocea and Magoulas) and a Neuro-Symbolic (Prentzas etal) approach respectively, leading to multi-combinations. Also, another two papers concern combinations of Fuzzy Logic with either NNs (Anastassopoulos and Iliadis) or Bayesian Nets (Fogelberg etal). Finally, one of the papers combines a NN-based approach with a Natural Language Processing one (Foster etal).

Four of the above papers present combinations developed in the context of an application. Applications involve Medicine (Anastassopoulos and Iliadis), Education (Cocea and Magoulas, Kalles etal) and Economy (Spanoudakis etal).

We hope that this collection of papers will be useful to both researchers and developers.

Given the success of this first Workshop on combinations of intelligent methods, we intend to continue our effort in the coming years.

Ioannis Hatzilygeroudis Constantinos Koutsojannis Vasile Palade ANN for prognosis of abdominal pain in childhood: use of fuzzy modelling for convergence estimation

George C. Anastassopoulos, Lazaros S. Iliadis Abstract. This paper focuses in two parallel objectives. First it aims in presenting a series of Artificial Neural Network models that are capable of performing prognosis of abdominal pain in childhood. Clinical medical data records have been gathered and used towards this direction. Its second target is the presentation and application of an innovative fuzzy algebraic model capable of evaluating Artificial Neural Networks’ performance [ 1 ]. This model offers a flexible approach that uses fuzzy numbers, fuzzy sets and various fuzzy intensification and dilution techniques to perform assessment of neural models under different perspectives.

It also produces partial and overall evaluation indices. The produced ANN models have proven to perform the classification with significant success in the testing phase with first time seen data. 1 INTRODUCTION

The wide range of problems in which Artificial Neural Networks can be used with promising results, is the reason of their growth [ 2, 3 ]. Some of the fields that ANNs are used are: medical systems [ 4-6 ], robotics [ 7 ], industry [ 8 – 11 ], image processing [ 12 ], applied mathematics [ 13 ], financial analysis [ 14 ], environmental risk modelling [ 15 ] and others.

Prognosis is a medical term denoting an attempt of physician to accurately estimate how a patient's disease will progress, and whether there is chance of recovery, based on an objective set of factors that represent that situation. The inference about prognosis of a patient when presented with complex clinical and prognostic information is a common problem, in clinical medicine. The diagnosis of a disease is the outcome of combination of clinical and laboratorial examinations through medical techniques.

In this paper various ANN architectures using different learning rules, transfer functions and optimization algorithms have been tried. This research effort was motivated form the fact that reliable and seasonable detection of abdomen pain constitute attainments in effective treatment of disease and avoidance of relapses. That is why the development of such an intelligent model that can collaborate with the doctors will be very useful towards successful treatment of potential patients. 2 DIAGNOSTIC FACTORS OF ABDOMINAL PAIN Several reports have described clinical scoring systems incorporating specific elements of the history, physical examination, and laboratory studies designed to improve diagnostic accuracy of abdominal pain [ 16 ]. Nothing is guaranteed, but Democritus University of Thrace, Hellenic Open University anasta@med.duth.gr, liliadis@fmenr.duth.gr decision rules can predict which children are at risk for appendicitis (appendicitis is the most common surgical condition of the abdomen). One such numerically based system is based on a 6-part scoring system: nausea (6 point), history of local RLQ pain (2 point), migration of pain (1 point), difficulty walking (1 point), rebound tenderness / pain with percussion (2 point), and absolute neutrophil count of >6.75 x 10`3/μL (6 point). A score <5 had a sensitivity of 96.3% with a negative predictive value of 95.6% for AA.

To date, all efforts to find clinical features or laboratory tests, either alone or in combination, that are able to diagnose appendicitis with 100% sensitivity or specificity have proven futile. Also, there is only one research work [ 4 ] in bibliography based on ANN that deals with the abdominal pain prognosis in childhood.

The incidence of Acute Appendicitis (AA) is 4 cases per 1000 children. However appendicitis despite pediatric surgeons’ best efforts remains the most commonly misdiagnosed surgical condition. Although diagnosis and treatment have improved, appendicitis continues to cause significant morbidity and still remains, although rarely, a cause of death. Appendicitis has a male-to-female ratio of 3:2 with a peak incidence between ages 12 and 18 years. The mean age in the pediatric population is 6-10 years. The lifetime risk is 8.6% for boys and 6.7% for girls.

The 15 factors that are used in the routine clinical practice for the assessment of AA in childhood are: Sex, Age, Religion, Demographic data, Duration of Pain, Vomitus, Diarrhea, Anorexia, Tenderness, Rebound, Leucocytosis, Neutrophilia, Urinalysis, Temperature, Constipation. The sex (males), the age (peak of appearance of A.A in children aged 9 to 13 years), and the religion (hygiene condition, feeding attitudes, genetic predisposition) were in relation with a higher frequency for AA. Anorexia, vomitus, diarrhea or constipation and a slight elevation of the temperature (370 C - 380 C) were common manifestation of AA. Additionally, abdominal tenderness principally in the RLQ of the abdomen and the existence of the rebound sign, are strongly related with AA.

Leucocytosis (>10.800 K/μl) with neutrophilia (neutrophil count > 75%) is considered to be a significant clue for AA. Urinalysis is useful for detecting urinary tract disease, normal findings on urinalysis are of limited diagnostic value for appendicitis.

The role of race, ethnicity, health insurance, education, access to healthcare, and economic status on the development and treatment of appendicitis are widely debated. Cogent arguments have been made on both sides for and against the significance of each socioeconomic or racial condition. A genetic predisposition appears operative in some cases, particularly in children in whom appendicitis develops before age 6 years. Although the disorder is uncommon in infants and elderly, these groups have a disproportionate number of compilations because of delays in diagnosis and the presence of comorbid conditions.

As diagnosis, there are four stages of appendicitis, including acute focal appendicitis, acute supurative appendicitis, gangrenous appendicitis and perforated appendicitis. These distinctions are vague, and only the clinically relevant distinction of perforated (gangrenous appendicitis includes into this entity as dead intestine functionally acts as a perforation) versus non-perforated appendicitis (acute focal and supurative appendicitis) should be made.

The present study is based on data set that is obtained from the Pediatric Surgery Clinical Information System of the University Hospital of Alexandroupolis, Greece. It consisted of 516 children’s medical records. Some of these children had different stages of appendicitis and, therefore, underwent operative treatment. This data set was divided into a set of 422 records and another set of 94 records. The former was used for training of the ANN, while the latter for testing. A small number of data records were used as a validation set during training to avoid overfitting. Table 1 represents the stages of appendicitis as well as the corresponding cases for each one. The 3rd column of Table 1 depicts the coding of possible diagnosis, as they used for ANN training and testing stages.

Normal ev ten i tra m e ta pO tre 3 NEURAL NETWORK DESIGN Data were divided into two groups, the training cases (TRAC) and the testing cases (TESC). The TRAC consisted of 417 concrete medical data records and the TESC consisted of 101. Each input record was organised in a format of fifteen fields, namely sex, age, religion, area of residence, pain time period, vomit symptoms, diarrhoea, anorexia, located sensitivity, rebound, wbc, poly, general analysis of urine, body temperature, constipation. The output record contained a single field which corresponded to the potential outcome of each case.

The determination if the TRAC and TESC data sets was performed in a rather random manner. The training and testing sample size which would be sufficient for a good generalization was determined by using the Widrow’s rule of thumb for the LMS algorithm which is a distribution free, worst case formula [ 2 ] and it is shown in the following equation 1. W is the total number of free parameters in the network (synaptic weights and biases) and ε denotes the fraction of the classification errors permitted during testing. The O notation shows the order of quantity enclosed within [ 2 ]. N = O⎜⎛ W ⎟⎞ (1)

⎝ ε ⎠

In the case examined here with 417 training examples used, the classification error that could be tolerated would be about 4%. 3.1 Description of the experiments performed

During experimentations, numerous ANN architectures, learning algorithms and transfer functions were combined in an effort to obtain the optimal network. For the Tangent Hyperbolic (TanH) transfer function the input data were normalized (divided properly) in order to be included in the acceptable range of [ -3, 3 ] to avoid problems such as saturation, where an element’s summation value (the sum of the inputs times the weights) exceeds the acceptable network range [ 17 ]. Standard back-propagation optimization algorithms using TanH, or Sigmoid or Digital Neural Network Architecture (DNNA) transfer functions, combined with the Extended Delta Bar Delta (ExtDBD) or with the Quick Prop learning rules [ 18, 19 ] were employed. The ExtDBD is a heuristic technique reinforcing good general trends and damping oscillations [ 20 ].

Modular and radial basis function (RBF) ANN applying the ExtDBD learning rule and the TanH transfer function were also used in an effort to determine the optimal networks. RBFs have an internal representation of hidden neurons which are radially symmetric, and the hidden layer consists of pattern units fully connected to a linear output layer [ 21, 22 ]. 3.2 ANN evaluation metrics applied Traditional ANN evaluation measures like the Root Mean Square Error (RMS error), R2 and the confusion matrix were used to validate the ensuing neural network models. It is well known that the RMS error adds up the squares of the errors for each neuron in the output layer, divides by the number of neurons in the output layer to obtain an average, and then takes the square root of that average. The confusion matrix is a graphical way of measuring the network’s performance during the “training” and “testing” phases.

It also facilitates the correlation of the network output to the actual observed values that belong to the testing set in a visual display [ 17 ], and therefore provides a visual indication of the network’s performance. A network with the optimal configuration should have the “bins” (the cells in each matrix) on the diagonal from the lower left to the upper right of the output. An important aspect of the matrix is that the value of the vertical axis in the generated histogram is the Common Mean Correlation (CMC) coefficient of the desired (d), and the actual (predicted) output (y) across the Epoch.

Finally, the FUSETRESYS (Fuzzy Set Transformer Evaluation System) that constitutes an innovative ANN evaluation system has been applied offering a more flexible approach [ 1 ]. 3.3 Technical description of the FUSETRESYS ANN evaluation model Fuzzy logic enables the performance of calculations with mathematically defined words called “Linguistics” [ 1, 23-25 ].

FUSETRESYS faces each training/testing example as a Fuzzy Set.

It applies triangular or trapezoidal membership functions in order to determine the partial degree of convergence (PADECOV) of the ANN for each training/testing example separately. The following equations 2 and 3 represent a triangular and a trapezoidal membership functions respectively [ 1 ].

x − a c − x μs(x;a,b,c)=max{min{ , },0} a<b<c (2)

b − a c − b μs(x;a,b,c,d)= max{min{ x − a ,1, d − x },0}a<b<c<d b − a d − c

(3) The model can produce various overall degrees of convergence (OVDECOV) for all of the training examples by applying either fuzzy T-Norm or fuzzy S-Norm conjunction operations, depending on the optimistic or pessimistic point of view of the developer. T15 15 15 15 7 9 9 7 7 0 9 0

Learning Rule/Transfer

Function Genetic Algorithm /TanH

NormCum_Delta/

TanH

NormCum_Delta/

TanH ExtDBD/

TanH μ ⎛ ~ ⎜⎜ A∩ B~ ⎟⎟⎞ ⎝ ⎠ Norms tend to produce lower aggregation indices so in the case of ANN evaluation they can be considered as a pessimistic approach, whereas the opposite happens with S-Norms [ 26 ]. In fact, each distinct Norm evaluates the performance of an ANN under a different perspective. For example the drastic product assigns the ANN a high OVDECOV only if it does not have extreme deviations between the desired and the produced classifications during the training/testing process [ 1 ] whereas the Einstein TNorm acts in a more average mode. The following equations 4 and 5 present the drastic product and the Einstein product T-Norms.

More details on fuzzy conjunction operators can be found in [2628].

= Min {μ ~ (Χ),μ ~ (Χ)} if Max {μ ~ (Χ),μ ~ (Χ)} = 1 else

A B A B μ ⎜⎜⎝⎛ A~∩ B~ ⎟⎟⎠⎞ = 0 (4)μ ⎜⎜⎝⎛⎜ Α ∩~ Β~⎟⎟⎟⎞⎠ = 2 − [μ A~{X ) μ+ μA~{B~X( X)μ)B~−( Xμ A)~{X )μ B~ ( X )]

(5)

The fact that the FUSETRESYS evaluates each training/testing example separately, offers a more clear view of the ANN’s performance. In this way the developers know if the network operates extremely bad or well in specific cases.

Also when there are several neurons in the output layer, the traditional approaches produce separate evaluation results for each one whereas the FUSETRESYS can produce an additive performance index (ADPERI) of the ANN. This could be done under different perspectives and under different degrees of optimism [ 1 ].

Finally the application of fuzzy set hedges offers the “dilution” and the “intensification” options. In this way by using the dilution approach the developer softens the membership function over the fuzzy set and weakens the membership constraints so that a point of the Universe of discourse is “truer” than it would be before [ 1, 27 ]. On the contrary the intensification hardens the MF over the FS and strengthens the membership constraints so that a point on the domain is “less true” than it used to be [ 1, 27 ]. The following equations 6 and 7 correspond to the intensification and dilution functions respectively. μ int ensify ( A) (X i ) = μ An ( X i ) (6) μ dilute ( A) (X i ) = μ An (X i ) (7)

In this way the ANN can be evaluated strictly by using a “very well fit” evaluation option, or in a more relaxed way by using the “somewhat fit” option. Of course it is in the developer’s hand to decide the potential type of the ANN’s evaluation and the degree of dilution or intensification. For a more detailed description of FUSETRESYS please see [ 1 ]. 4 RESULTS AND DISCUSSION 4.1 ANN analysis Several experiments were performed. The following table 2 presents the structure of the four most effective Back Propagation (BP) multilayer (ML) neural networks. In all cases of ANN models, the classical approach for overcoming the overfitting problem has been followed. More specifically, a set of validation data have been provided to the algorithm in addition to the training data. The algorithm has monitored the error with respect to this validation set, while using the training set to drive the gradient descent search. The number of weight tuning iterations performed by the system, were determined in each case based on the criterion of lowest error over the validation set. Two copies of the best performing weights are kept: one copy for training and another one of the best performing weights thus far. made towards the development of modular ANN (MODANN) for the classification problem solution. The term MODANN refers to the “adaptive” mixtures of local experts (LOCEXP) as proposed by [ 29 ].

They consist of a group of BP ANN referred to as local experts competing to learn different aspects of a problem. A “gating ANN” controls the competition and learns to assign different parts of the data space to different networks.

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97

Code number for each evaluated record The LOCEXP have the same architecture but they can apply distinct learning rules or transfer functions. Also the number of the output processing elements of the gating network is equal to the number of LOCEXP used. The number of the neurons in the hidden layer of the gating network should be larger than the number of the output processing elements [ 17 ]. The above table 5 presents the structure and the architecture of the optimal MODANN that was developed for the medical classification problem examined here. The performance of the developed modular network is very satisfying, having an R2 value of 0.9434 and a FUSETRESYS produced average PADECOV equal to 0.9733 (using the Triangular membership function) in the testing process using the first time seen testing data set.

The following figure 2 depicts the gating probabilities for the optimal MODANN.. The above Table 6 presents a small sample of the 101 distinct PADECOV values produced by the FUSTRESYS.

Also the Einstein T-Norm was applied for the determination of the overall degree of convergence of the ANN. The ML#2 ANN had a very high OVEDECOV index with a value of 0.98299 whereas the other ML#3 ANN and the MODANN #REF1 had OVEDECOV indices as high as 0.97. The Drastic Product T-Norm was not applied in this research effort because it was proven unnecessary from the data in table 5 where there were no serious indications of extreme bad ANN performance in any of the testing examples. 5 CONCLUSIONS The above research has obtained six ANNs with good level of convergence and it has proven that there exist at least four ANNs that have high performance indices, in the case of abdominal pain classification. Namely the best ANNs are two ML BP ANN, a RBF ANN and a MODANN using a referee gating network and two local experts. All of them have been described in the previous sections.

A very interesting part of the whole research effort is the application of an innovative ANN evaluation model called FUSETRESYS that uses fuzzy logic and fuzzy algebra proposed in [ 11 ].

The new evaluation scheme has performed individual convergence indices namely PADECOV, for the output of each single data record used in the testing phase. The worst PADECOV value equals to 0.6666 which actually is the degree of membership of each data record to the FS “Actual output value equal to the desired value”. This worst case appears three times exactly in the same cases of data records, for the ML#2, ML#3, #1REF ANN and it shows that the classification capacity of the developed networks is not bad even in the worst cases. This conclusion becomes stronger by considering the fact that the second worst PADECOV index has a value of 0.833.

If an overall ANN validation is performed the traditional evaluation instruments agree with the FUSETRESYS that the most suitable ANN is the ML BP with code# 4 whereas all of the other developed ANN have almost an equally good performance. The Einstein T-Norm produces a higher “good performance index” for the MODANN than the traditional methods.

As it can be seen in table 7, the OVDECOV indices have very high values for ML#2 and for REF#1 and ML#3 networks when a “Partly fit” validation is performed. There is significant differentiation when a very strict evaluation is done under the linguistic “Very well fit”. The OVDECOV indices fall from 0.99 to 0.75 for ML#2, from 0.99 to 0.65 for #REF and from 0.99 to 0.71 for ML#3 respectively. This is a very useful approach and it shows the actual power of FUSETRESYS due to the fact that it shows the differentiation of the average convergence degree of the three ANN when more strict validation methods are applied. So ANN fed with the same data records in testing and appearing to have more or less the same performance, they are very seriously differentiated when more strict convergence validation methods are performed.

The proposed ANN architecture faces the appendicitis prediction quite satisfactory, based on both the above presented results, and the pediatric surgeon’s opinion that used these ANNs in their everyday routine clinical practice.

The innovative ANN evaluation model that was applied successfully in this research effort will be used extensively in the future, in an integrated effort to check its validity under various perspectives.

ACKNOWLEDGEMENTS We would like to thank the pediatric surgeons of the Pediatric Surgeon Department of Medical School of Democritus University of Thrace, for their contribution in the concession of the medical records. Using Genetic Programming to Learn Models Containing Temporal Relations from Spatio-Temporal Data Andrew Bennett and

Derek Magee 1 Abstract. In this paper we describe a novel technique for learning predictive models from non-deterministic spatio-temporal data. Our technique learns a set of sub-models that model different, typically independent, aspects of the data. By using temporal relations, and implicit feature selection, based on the use of 1st order logic expressions, we make the sub-models general, and robust to irrelevant variations in the data. We use Allen’s intervals [ 1 ], plus a set of four novel temporal state relations, which relate temporal intervals to the current time. These are added to the system as background knowledge in the form of functions. To combine the sub-models into a single model a context chooser is used. This probabilistically picks the most appropriate set of sub-models to predict in a certain context, and allows the system to predict in non-deterministic situations. The models are learnt using an evolutionary technique called Genetic Programming.

The method has been applied to learning the rules of snap, and uno by observation; and predicting a person’s course through a network of CCTV cameras. 1

Introduction Learning predictive models from spatial-temporal data is, in general, a hard problem. Events and activities can have variations in their spatial, and temporal scope; include multiple (variable numbers of) objects; can overlap temporally with other events, and activities; and happen in a non-deterministic manner. A model for predicting spatiotemporal events must support this complexity. Our novel technique learns a set of sub-models that model different, typically independent, aspects of data. The sub-models can, in addition to object properties, use temporal relations to describe the scene, and implicit feature selection, based on the use of 1st order logic expressions, to make them robust to irrelevant variations in the data. To combine the sub-models into a single model a context chooser is used. This picks the most appropriate set of sub-models to predict in a certain context, and allows the system to predict in non-deterministic situations.

Using the combination of sub-models and the context chooser also reduces the complexity of the model search space, and allows the system to learn a global sub-model that matches most of the dataset, and then learn simple sub-models to cover the cases where the global sub-model does not work.

This approach extends our previous work [ 2 ], by allowing a qualitative, as well as a markovian representation of time. This is done by replacing the step-wise markovian view with temporal relations like Allen’s intervals [ 1 ], and a set of four additional relations to relate the temporal state of objects to the current time. We use Genetic Programming to learn the models, and present an improved fitness function. The system has been successfully tested on handcrafted snap, 1 University of Leeds, UK, email: {andrewb,drm}@comp.leeds.ac.uk and uno datasets, along with learning from video the structure of a set of mock CCTV cameras.

There has been much previous work on learning from spatiotemporal domains. Traditional methods usually require a fixed dimensionality vector, existing with canonical ordering / constant meaning, to represent the world. To construct this vector often requires knowledge of the domain, making these methods hard to use in a problem domain where the structure of the domain is variable, and not known a priori. One approach to modelling data of variable dimensionality is to take statistics of a variable size set [ 8 ]. This produces a fixed set description, however spatial relationship information is lost in this process. If this information is important within a domain this leads to a poor model. Feature selection can be used to find the most relevant subset of the data, which then allows for a more general model to be built. However, the relevant subset may change from one context to another.

Temporal modelling approaches such as Markov chains, Hidden Markov Models (HMMs) and Variable Length Markov Models (VLMMs) [ 7 ] use a description based on graphs to model state transitions. These methods also usually need a fixed dimensionality vector with canonical ordering for each observation. There does not have to be a fixed dimensionality for every observation vector, as theoretically each observation vector can have a different number of dimensions. It is possible to optimise their structure by using local optimisation approaches based on information theory [ 3 ]. In VLMMs this optimisation acts as kind of temporal feature selection, but as the input variables stay in the same fixed order spatial feature selection is not performed.

Bayesian networks are a generalisation of probabilistic graph based reasoning methods like HMMs and VLMMs. Again these networks require a fixed input vector, but again their relational structure can be optimised by local search [ 12 ], genetic algorithms [ 5 ], or MCMC [ 6 ] usually based on information theoretic criteria.

An alternative to using graph based methods is to use (1st order) logical expressions. Feature selection is implicit in the formalism of these expressions. Logical expressions also make no assumptions about the ordering of variables, so there is no need to have a have them in a fixed ordering. Progol [ 14 ] and HR [ 4 ] are Inductive Logic Programming (ILP) methods. In general ILP takes data and generates a set of logical expressions describing the structure of the data. Progol does this by iterative subsumption using a deterministic search with the goal of data compression. HR does this by using a stochastic search using a number of specialist operators. This is similar to Genetic Programming which is described below. These approaches suffer from a number of disadvantages. Firstly, logical expressions are deterministic, so it is hard for then to model non-deterministic situations. However, there has been much work on combining (1st order) logic and probability to solve this problem [ 16 ] and [ 9 ]. Secondly Progol’s search is depth bounded, which limits the size of problems it can work on, as explained in [ 15 ]. Thirdly Progol’s fitness function is only based on how well the model compresses the data, and not how well the model predicts the data. This can cause incorrect, or invalid models to be produced.

Genetic Programming (GP) [ 10 ] is a evolutionary method, similar to genetic algorithms, for creating a program that model a dataset. In a similar way to HR, it takes a dataset data, a set of terminals, and a set of functions; and using a set of operators generates a binary tree that models the data.

Qualitative representations can be used to describe spatiotemporal data in an abstract manner. [ 1 ] describes a set of seven temporal relations to represent temporal interactions between objects.

There has been previous work in learning of spatio-temporal models from video by [ 15 ] who produced a system that could learn basic card games. It had three parts: an attention mechanism, unsupervised low-level learning, and high-level protocol learning. The attention mechanism uses a generic blob tracker, that locates the position of the moving objects. From this a set of features including: colour, position and texture are extracted. The data is clustered into groups. Using these clusters new input data is assigned its closest cluster prototype. A symbolic data stream is then created by combining together the clustered data, with time information. The symbolic stream is passed to Progol, which builds a model of the data. Once the model has been learnt it can be applied to new data. This allows the system to interact in the world.

[ 17 ] looked at learning event definitions from video. A raw video of a scene is converted into a polygon representation. This is then transformed into a force-dynamic model which shows how the objects in the scene are in contact with one another. Using this data andmeets-and (AMA) logic formulae describing the events are learnt using a specific-to-general ILP approach. Work in the area of learning from spatial-temporal data, such as the previous two approaches have inspired our work.

The reminder of this paper will take the following form. The second section looks at previous work about the architecture for the models. The subsequent section looks at an extension to this work to incorporate temporal relations into the sub-models. The subsequent section describes how these models are learnt by Genetic Programming. The subsequent section presents an evaluation of our system, and the final section shows the conclusions of the work and the further work. 2

Architecture for Models of Spatio-Temporal Data ?

Context chooser Data

Sub−models Output

Overall output

An architecture to represent a model of spatio-temporal data, along with associated learning methods is described in our previous work [ 2 ]. We use this architecture as shown in Figure 1. It is broken down into two parts: the sub-models, and the context chooser. The submodels each model a separate part of the underlying process generating the data. Each sub-model contains two sections: a search section, and an output section. The search section looks for a particular pattern in the dataset. A query language, created by ourselves, having some similarity to SQL and Prolog, is used to describe the actual search, and a binary tree is used to represent it. The output section describes what is implied if the search returns true. This will be a set of entities and relations, and their properties the sub-model predicts.

Figure 2 shows an example of a sub-model.

& = =

Output

= Light1.colour(t−1) C1 Light2.colour(t−1) C0 Light3.colour(t)

The context chooser is used to decide how to combine the submodels in different situations. It takes as its input a boolean vector describing which sub-models have evaluated true, and returned outputs, and using a probability distribution decides which ones will form the overall output. A context Sn is defined as a set of sub-models M producing an output in a given context, for example Sn = M1, M2 represents that M1, and M2 have search sections that have evaluated true at the same time. For each context a probability distribution over the possible combinations of model outputs for that context is defined, for example Pn(M1), Pn(M2), Pn(M1, M2), where Pj Pn(j) = 1. This distribution is formed from the frequency of occurrence of each situation in the training data in the given context.

This can be implemented as a sparse hash table. 3

Incorporating Temporal Relations into

Sub-models To evaluate the sub-models history data from the world is required.

The search section of the sub-model uses data pointers to reference particular data items in the history. The search section of the submodel is then evaluated with respect to this data. If the search section evaluates true, then the output section is implied. In our previous work [ 2 ] each data pointer could only reference fixed quantified time points in the history, as shown in Figure 2. The use of this qualitative markovian representation of time implies an exact ordering of the events. When multiple independent events are happening simultaneously this representation will fail, and an alternative method of representing temporal ordering is necessary. In order to quantify temporal ordering in the data we use a combination of Allen’s intervals [ 1 ], and four novel temporal state relations. Allen’s intervals describe temporal relations between objects. There are seven relations which are: meets, starts, finishes, during, before, overlaps, and equal to. Along with describing temporal relations between objects in the history, we need to describe how the objects relate to the current time. An object 2 goes through a series of temporal states, based on how its start and end time relates to current time, these are described Figure 3. Firstly the object is entering the world, its end time is unknown, but its start time is the same as the current time. Secondly the object exists in the world, again the end time is unknown, but its start time is less than the current time. Thirdly the object is leaving the world and its end time is equal to the current time. Finally the object has left the world, where both its start, and end times are less than the current time.

Current Time Entering Current_time = start Existing Current_time > start Leaving Current_time = end AND Current_time > start Left Current_time > end AND Current_time > start

Both the Allen’s intervals, and our additional temporal state relations, are represented in the system as functions of the data, that appear in the search section of the sub-models. These relations do not appear in the data; only the temporal range of individual objects occurs in the data. As the data pointers can be used over the entire history, it is quite likely that a sub-model will evaluate on many different parts of the history. To resolve this issue we just use the result which includes the most recent data. The justification for this is the sub-model will have already output this information at a previous time in other situations. 4

Learning the Models from Data Previously in our previous work [ 2 ] it has been shown that it was intractable to find the set of optimal sub-models by exhaustive search, for all but the simplest problems. The search space is complex, so a stochastic search method was chosen as an alternative. We use Genetic Programming [ 10 ], which has already been successfully used for pattern recognition tasks [ 11 ].

Genetic Programming (GP) [ 10 ] evolves a population of programs until a program with the desired behaviour is found. It is a type of genetic algorithm, but the programs are stored as binary trees, and not as fixed length strings. Functions are used for the nodes, and terminals (for example constants, and variables) are used for the leaf nodes. In order for the population to evolve a fitness function (in our case a predictive accuracy score) must be defined. This score will be used by the GP system to decide which programs in the current generation to use to produce the next generation, and which ones to throw away. To initialise the system, a set of randomly generated programs must be created. Each then receive a score using the fitness function. Algorithms including crossover, mutation and reproduction use the programs from the current generation to create a new generation. Crossover takes two programs and randomly picks a sub-tree on each program, these two trees are swapped over, creating two new programs. Mutation takes one program, randomly picks a sub-tree on it, and replaces it with a randomly generated sub-tree. Reproduction copies a program exactly as it is into the new generation. The programs in the new generation are then scored based on how well data is predicted, and the process is repeated. The GP system will stop when a certain fitness score is reached, or a certain number of generations has passed.

In our implementation of GP we assume that a program is a model containing a context chooser, and a set of sub-models. To initialise the population we generate a set of models just containing one randomly generated sub-model. The sub-model is produced using Koza’s ramped half and half method [ 10 ]. We apply a hierarchical structure to our sub-models in a similar manner to [ 13 ], to try and cut down the search space, and to make finding a solution more efficient.

A set of operators is then used to evolve the population. There are two kinds of operators. Firstly there are operators that try to optimise sub-models which are used in the model, and secondly there are operators that optimise the sub-models themselves. A technique called tournament selection [ 10 ] is used to pick a model from the population. Tournament selection picks n models at random from the population, and returns the one with the lowest score, for our experiments we set n to be 5. The operators used to optimise sub-models which are used in the model are shown below: Reproduction A set number of models are picked via tournament

selection and copied directly into the new population.

Adding in a sub-model from another model Two models are picked by tournament selection. A sub-model from the first picked model is randomly selected, and added to the second chosen model.

Replacing a sub-model Again two models are picked by tournament selection, and a sub-model from the first chosen model is then replaced by a sub-model randomly selected from the second chosen model.

Removing a sub-model A sub-model is picked by tournament selection, and a randomly selected sub-model is removed.

The only operator used to optimise the sub-models themselves is crossover. In crossover two models are picked using tournament selection. A sub-model from each model is then randomly selected, and standard crossover [ 10 ] is performed on these sub-models.

To score the models a fixed length window is randomly moved over the dataset. At each generation two random locations are picked: one for training, and one for testing. In the training phase the probability distribution used in the context chooser is calculated. In the testing phase the fitness of a model (m) is evaluated over a windowed section of the dataset (w). For each position in the window the model is given a set of history data (h), calculated from the window, and is queried to produce a prediction. This produces a set of possible corresponding outputs (o), and a set of possible corresponding output likelihoods (ol). The similarity (C) of each output with the actual output (r), is computed using the F indBestM atch function, as shown in Equation 1. This function takes the set of actual output, and the set of model output, and firstly pads out them out with blank data so that they are the same size. Then for each item in the actual output set, a unique match in the model output set is found. For each of the matches a comparison is done between the two objects. The comparison looks at how similar each of the properties in the two objects are.

Each of the comparisons are summed together to produce a score that shows how good that set of matches is. An exhaustive search is then performed over all the possible combination of matches to find the best (maximal) matching score. The result is then multiplied by its output likelihood. From this the best (maximal) output is found. This 3 is then repeated over the rest of window, and the results summed and then normalised to produce (S), as shown in Equation 2. This fitness function is an improved version to the one described in our previous work [ 2 ], as it can be applied to non-deterministic datasets.

C(o, r) = F indBestM atch(o, r) S(m, w) = 1 ∗ X |w|

M axn(oln ∗ C(on, ri)) i

The system runs in two stages, and will stop running once it exceeds a maximum number of generations. Firstly the system is initialised in the manner described above, and then for five generations it works out the best set of sub-models to use in the models. To do this the system uses reproduction (10%), removing (10%), adding (40%), and replacement (40%). Next the system will optimise these models to find the best solution. It uses crossover (60%), reproduction (10%), removing (10%), adding (10%), and replacement (10%).

(1) (2) 5

Evaluation Our method was evaluated on three different datasets, which were: handcrafted uno data, handcrafted snap data, and data from people walking through a network of mock CCTV cameras. More detail about these datasets is presented in the following section. A 10 minute video of people walking along a path containing a junction was filmed. This was then used to mock up a network of CCTV cameras. Figure 4 shows a frame from the video. Virtual motion detectors, representing CCTV cameras, were hand placed over the video has shown in Figure 4. Using frame differencing, and morphological operations, the video was processed to determine the location of the motion. If the number of moved pixels in a region exceeded a fixed threshold then the virtual detector outputted that motion had occurred at that location. Hysteresis on the motion detection is implemented as a 2 state, state machine (where the states are motion/no motion). The state machine requires a numbers of frames (normally 10) of stability to change state. The data produced is then placed in a datafile with a motion event recorded per state change going from no motion to motion. This was used to create a training datafile containing 84 state changes and a test file containing 46 state changes. The snap dataset was handcrafted, but the format of it was similar to the snap dataset used in the work of [ 15 ]. The snap sequence is the following: initially the computer will see a blank scene, then it will hear the word play, next two coloured cards will be seen. Either they will be both put down at the same time, or put down one by one. If they are the same then the word “equals” will be heard, otherwise “different” will be heard. Then the cards are removed, again either one by one, or at the same time. We ask the computer to only learn the sections where a human is speaking, as it would be impossible to accurately predict the next two cards because they are essentially random. Again three datasets were prepared: a non-noisy, and noisy training set, and a non-noisy test set. All the datasets contained around 50 rounds of snap. The noisy data was generated by adding 10% noise to the non-noisy training set. The noise took the form of removing cards, removing the play state, and changing the output state, for example making the output not equal when it should be equal. The handcrafted uno dataset has a similar sequence to the snap dataset. Again the computer will initially see a blank scene. Then play will be heard. Next two cards, each one having one of three possible coloured shapes on them, will be placed down either at the same time, or one by one. If the two card have the same coloured shape on them the “same” is heard; or if they have shapes of the same colour then “colour” is heard; or if they have the same shapes on then “shape” is heard; or if the cards are different then “nothing” is heard. The cards are then removed either together, or one by one.

Three datasets were created: a non-noisy training set, a noisy training set, and a non-noisy test set. Each one contained around 50 rounds of uno. Again noisy data was prepared by adding 10% of noisy data to the non-noisy training data. The noise took the same form as the noisy snap data. To test the system five runs were allocated to each possible combination of dataset. For each run a different random number seed was used to initialise the system. The tests were run on a 2GHz machine having 8GB memory.

To evaluate how well the models have been learnt they were tested on a separate test set. Two metrics were used to evaluate the results: coverage, and prediction accuracy. Coverage (C) scores if the system can correctly predict the dataset (ie. the probability of correct 4 prediction is greater than 0%) and is the number of correct predictions (pc) divided by the dataset size (d) as shown in Equation 3.

Prediction accuracy (A) scores with what probability the correct prediction is made, and is the sum of the likelihoods of each correct prediction (pl) divided by the dataset size, as shown in Equation 4.

In non-deterministic scenarios this will not be 100%.

pc C = (3) d pl A = (4)

Both the snap datasets were tested on a population size of 4000, and the system was run for 65 generations, taking around 5 hours to do each run. All the runs using the non-noisy datasets were successful. However the models did not get 100% coverage because they failed to produce any output at the start of the test dataset as there was insufficient items in the history. Figure 5 shows an example of this, as it will only evaluate once there are three cards in the history. Four of the results did not predict the first two items in the test dataset, and one of the results only failed to predict the first item. Two out of the five runs using the noisy snap dataset got an exact solution. The noise effected the models causing the sub-models to model incorrect parts of the dataset. This was because some of the noise added to the noisy training set changed the outcomes for some rounds of snap, this then causes the system to model this noise, and to incorrectly predict the outcomes in the test set. Again, like in the non-noisy snap models there was problems predicting the start of the test dataset.

The models themselves made use of both the Allen’s intervals, and the temporal state relations. Figure 5 shows one of the sub-models produced from the non-noisy snap training set. It shows the use of Allens intervals (the before relation), and the temporal state relations (the enter relation). Most of the models contained four sub-models in them.

The uno datasets were run on a population of size 6000, and for 65 generations, taking around 7 hours to do each run. One out of five runs on the non-noisy dataset managed to get the correct solution, but it did not get 100% coverage because it did not have enough history at the start of the test set to predict the initial items. The rest of the nonnoisy results were very close to the solution, and probably needed more generations to find the exact solution. The models themselves were very similar to the models produced for the snap datasets. Both Allen’s intervals, and the temporal state relations were used. None of the runs for the noisy dataset managed to produce an exact result, with the noise causing the sub-models to model incorrect parts of the dataset.

The runs using the path dataset used a population size of 2000, and the system was run for 65 generations, taking around 3 hours to do each run. All the runs using the non-noisy dataset predicted well in the main section of the test dataset, but failed to predict well at the start of the test dataset, due to lack of history. Some of the runs also failed to predict infrequently occurring actions in the test set. In the runs using the noisy training set all the models learnt the frequently occurring actions, but they all started to learn some of the noise in the dataset, and this effected their scores on the test dataset. Both the non-noisy and noisy models used Allen’s intervals, and the temporal state relations. 7

Conclusions We have extended the previous work of [ 2 ] and shown that that it is possible, by the use of temporal relations, to use a qualitative, as well

Snap Noise Uno No Noise

Uno Noise Path No Noise

Path Noise as a markovian representation of time. This technique is important for a number of reasons. Firstly it produces models that are robust to irrelevant variations in data. Secondly, it allows the system to learn from a dataset containing single actions, and then be able to predict from a dataset containing multiple overlapping actions.

In future work will be looking into using spatial, as well as temporal relations in the system. We are also looking into trying out quantitative relations, so that a relation will not work on objects that are either too close, or too far away. We will also be looking into changing the output from a sub-model based on what data the search section has evaluated on. Finally we will be looking at speed improvements to the system so that the run time can be reduced. soning’, in International Symposium on Imprecise Probabilities and

Their Applications, pp. 193–202, (2005). [ 10 ] John Koza, Genetic Programming, MIT Press, 1992. [ 11 ] John Koza, Genetic Programming II, MIT Press, 1994. [ 12 ] Philippe Leray and Olivier Francios, ‘Bayesian network structural learning and incomplete data’, in Adaptive Knowledge Representation and Reasoning, (2005). [ 13 ] David Montana, ‘Strongly typed genetic programming’, in Evolution

ary Computation, (1995). [ 14 ] S.H. Muggleton and J. Firth, ‘CProgol4.4: a tutorial introduction’, in

Relational Data Mining, 160–188, Springer-Verlag, (2001). [ 15 ] Chris Needham, Paulo Santos, Derek Magee, Vincent Devin, David

Hogg, and Anthony Cohn, ‘Protocols from perceptual observations’,

Artificial Intelligence, 167, 103–136, (2005). [ 16 ] N. J. Nilsson, ‘Probabilistic logic’, Artificial Intelligence, 28, 71–87,

(1986). [ 17 ] Jeffrey Mark Siskind, ‘Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic’, Articial Intelligence Research, 15, 31–90, (2000).

Mihaela Cocea and

George D. Magoulas 1 Abstract. Most of the existing learning environments work in wellstructured domains by making use of or combining AI techniques in order to create and update a learner model, provide individual and/or collaboration support and perform learner diagnosis. In this paper we present an approach that exploits the synergy of case-base reasoning and soft-computing for learner modelling in an ill-structured domain for exploratory learning. We present the architecture of the learner model, the knowledge formulation in terms of cases and illustrate its application in an exploratory learning environment for mathematical generalisation. 1

INTRODUCTION Several AI techniques have been proposed in intelligent learning environments, such as case-based reasoning [ 27 ], [ 10 ], bayesian networks [ 4 ], [ 6 ], neural networks [ 2 ], genetic and evolutionary algorithms [ 24 ], neuro–fuzzy systems [ 26 ], as well as synergistic approaches, such as genetic algorithms and case-based reasoning [ 13 ], hybrid rules integrating symbolic rules with neurocomputing [ 11 ], and expert systems with genetic algorithms [ 18 ].

Exploratory Learning Environments (ELEs) belong to a particular class of learning environments built on the principles of constructivism paradigm for teaching and learning. ELEs place the emphasis on the opportunity to learn through free exploration and discovery rather than guided tutoring. This approach has proved to be beneficial for learners in terms of acquiring deep conceptual and structural knowledge. However, discovery learning without guidance and support appears to be less effective than step-by-step guiding learning environments [ 16 ]. To this end, an understanding of learner’s behaviour and knowledge construction is needed [ 22 ].

Most existing ELEs use simulations as a way of actively involving learners in the learning process (e.g. [ 28 ], [ 14 ]) and exploit cognitive tools [ 29 ] to support their learning. Few such systems model learner’s knowledge/skills; for example [ 4 ] and [ 6 ] use bayesian networks and [ 26 ] combines neural networks with fuzzy representation of knowledge. Another category of ELEs is closer to the constructivist approach by allowing the learner to construct their own models rather than explore a “predefined” one. Compared to conventional learning environments (even environments that use simulations), this type of ELE requires approaches to learner modelling that would be able to capture and model the useful interactions that take place as learners construct their models.

In this paper, we present an approach to learner modelling in ELEs (suitable for both exploring simulations and constructing models) that combines case-based reasoning with other AI techniques. The 1 The authors are with the London Knowledge Lab, Birkbeck College, University of London, UK; email: fmihaela;gmagoulasg@dcs.bbk.ac.uk subsequent section briefly introduces the application domain, namely mathematical generalisation, and the ELE used, called ShapeBuilder, and discusses the challenges involved in performing learner modelling. Section 3 presents a conceptual framework for the learner modelling process and describes the case-based formulation. Section 4 illustrates the process with an example, while Section 5 concludes the paper and outlines future work. 2

EXPLORATORY LEARNING FOR

MATHEMATICAL GENERALISATION Mathematical generalisation (MG) is associated with algebra, as “algebra is, in one sense, the language of generalisation of quantity. It provides experience of, and a language for, expressing generality, manipulating generality, and reasoning about generality” [ 20 ].

However, students do not associate algebra with generalisation as the algebraic language is perceived as been separate from what it represents [ 15 ]. To address this problem the ShapeBuilder [ 8 ] system, which is an ELE under development in the context of the MiGen project 2, aims to facilitate the correspondence between the models, patterns and structures (visual representations) that the learners build, on one hand, and their numeric, iconic and symbolic representations, on the other hand. ShapeBuilder allows the construction of different shapes [ 9 ], e.g. rectangles, L-shapes, T-shapes and supports the three types of representations aforementioned: (a) numeric representations that include numbers (constants or variables) and expressions with numbers; (b) iconic representations which correspond to icon variables; (c) symbolic representations that are names or symbols given by users to variables or expressions. An icon variable has the value of a dimension of a shape (e.g. width, height) and can be obtained by double-clicking on the corresponding edge of the shape.

It is represented as an icon of the shape with the corresponding edge highlighted (see Figure 1a).

Constants, variables and numeric expressions lead to specific constructions/models, while icon variables and expressions using them lead to general ones. Through the use of icon variables, ShapeBuilder encourages structured algebra thinking, connecting the visual with the abstract (algebraic) representation, as “each expression of generality expresses a way of seeing” [ 20 ] (see Figure 1b). It also uses the “messing up” metaphor [ 12 ] that consists of asking the learner to resize a construction and observe the consequences; the model will “mess up” only if it is not general (see Figures 1c and d), indicating learner’s lack of generalisation ability.

When attempting to model the learner in an ELE for such a wide domain as MG, several challenges arise. The main and widely ac2 Funded by ESRC, UK, under TLRP e-Learning Phase-II (RES-139-25

0381); http://www.tlrp.org/proj/tel/tel_noss.html. knowledged challenge is to balance freedom with control: learners should be given enough freedom so that they can actively engage in activities but they should be offered enough guidance in order to assure that the whole process reflects constructivist learning and leads to useful knowledge [ 21 ]. This and some other challenges are illustrated in Table 1 with examples from the domain of MG.

Example When a learner is trying to produce a general representation, for how long should he be left alone to explore and when does guidance become necessary? Besides learner’s knowledge of MG concepts (e.g. use of variables, consistency between representations, etc.), other aspects need to be modelled in order to support the learner during exploration: shapes constructed, relations between shapes, etc.

In exploratory learning it is difficult to categorise actions or learner’s explorations into “correct” and “incorrect”. Moreover, actions that might lead to incorrect outcomes such as resizing can be more valuable for constructivist learning than “correct” actions.

Can consistency be inferred from the fact that a learner is checking the correspondence between various forms of representations? If so, is that always true? Are there any exceptions to this rule? As it is neither realistic nor feasible to include all possible outcomes (correct or incorrect) to model the domain of MG, only key information with educational value could be stored, such as strategies in solving a task. The challenge is how to represent and detect them. 3

A CONCEPTUAL FRAMEWORK FOR

LEARNER MODELLING Given the challenges mentioned in Table 1 a conventional learner modelling approach does not fit the purposes of ELEs. Due to the exploratory nature of the activities and the diversity of possible trajectories, flexibility in the representation of information and handling of uncertainty are two important aspects for effectively supporting the learning process. As case-based reasoning offers flexibility of information representation and soft computing techniques handle uncertainty, a combination of the two is used. Moreover, previous research has proved the benefits of combining case-based reasoning with neural networks [ 23 ] and fuzzy quantifiers [ 30 ]. In the following subsections, the architecture of the system, the AI components and their role are described. The architecture of the “Intelligent” ShapeBuilder is represented in Figure 2. As the learner interacts with the system through the interface, the actions of the learner are stored in the Learner Model (LM) and they are passed to the Interactive Behaviour Analysis Module (IBAM) where they are processed in cooperation with the Knowledge Base (KB); the results are fed into the LM. The Feedback Module (FM) is informed by the LM and the KB and feeds back to the learner through the interface.

The KB includes two components (see Figure 2): a domain and a task model. The domain model includes high level learning outcomes related to the domain (e.g. using variables, structural reasoning, consistency, etc.) and considers that each learning outcome can be achieved by exploring several tasks. The task model includes different types of information: (a) strategies of approaching the task which could be correct, incorrect or partially correct; (b) outcomes of the exploratory process and solutions to specific questions associated with each (sub)task; (c) landmarks, i.e. relevant aspects or critical events occurring during the exploratory process; (d) contexts, i.e. reference to particular (sub)tasks.

The IBAM component combines case-based reasoning with soft computing in order to identify what learners are doing and be able to provide feedback as they explore a (sub)task. More specifically, as they are working in a specific subtask, which specifies a certain context, their actions are preprocessed, current cases are identified and matched to the cases from the Task Model (the case base). Prior to matching, local feature weighting [ 23 ] is applied in order to reflect the importance of the attributes in the current context.

In the FM component, multicriteria decision making [ 7 ] will be used to obtain priorities between several aspects that require feedback depending on the context.

Case-based Knowledge Representation In case-based reasoning (CBR) [ 17 ] the knowledge is stored as cases, typically including the description of a problem and the corresponding solution. When a new problem is encountered, similar cases are are higher than the corresponding ones for the case of the transformation of Eq. (3). Moreover, note that for the case of the rst transformation and 2 = 0:5 a bit with value `0' is more likely to be followed by a bit with the same value (probability equal to 0:55854); a phenomenon that does not occur at present. For the pattern `11' the probability of encountering a zero immediately after it becomes 0:933909, 0:628256, and 0:717049, for 2 equal to 0.01, 0.1, and 0.5, respectively.

Finally, for the pattern `01' the probability of zero after its appearance is 0:932387, 0:538762, and 0:568140 for 2 equal to 0.01, 0.1, and 0.5, respectively. The predictive power of the binary patterns, `0', `11', (perfect predictors in the noise-free binary sequence) and `01' (good predictor in the noise-free binary sequence), with respect to the value of the variance of the additive noise term, 2 is illustrated in Fig. 5. To generate Fig. 5, 2 assumed values in the interval [0; 0:5] with a stepsize of 10 3. 4

Conclusions Despite the chaotic nature of the tent map and the resulting complexity of the binary sequences that were derived after the application of two threshold, binary, transformations a large number of short-term predictors was detected. The reported experimental results indicate that the binary sequences generated through the variable threshold binary transformation are more predictable than those obtained through the xed threshold transformation. This nding is clearer for values of the control parameter, r, close to its upper bound, 2. Indeed for r = 1:999 all the patterns of length up to nine appear in the binary sequences obtained through the rst transformation, suggesting that there is no perfect predictor. On the contrary, for the sequences generated through the second transformation with the same value of r, only three out of the four possible patterns of length two are encountered, suggesting that there is a perfect short-term predictor of length one. The inclusion of an additive Gaussian noise term with zero mean in the tent map equation eliminated all perfect predictors. However, for small values of the variance of the Gaussian noise binary patterns with high predictive power were identi ed.

Future work on the subject will include the investigation of multiplicative noise, as well as, the application of this methodology to real{world time series and in particular nancial time series. It is worth noting that the second binary transformation is particularly meaningful in the study of nancial time series as it corresponds to the direction of change of the next value relative to the present one.

Acknowledgments This work was partially supported by the Hellenic Ministry of Education and the European Union under Research Program PYTHAGORAS-89203. Improving the Accuracy of Neuro-Symbolic Rules with

Case-Based Reasoning

Jim Prentzas1, Ioannis Hatzilygeroudis2 and Othon Michail2 Abstract. In this paper, we present an improved approach integrating rules, neural networks and cases, compared to a previous one. The main approach integrates neurules and cases.

Neurules are a kind of integrated rules that combine a symbolic (production rules) and a connectionist (adaline unit) representation. Each neurule is represented as an adaline unit.

The main characteristics of neurules are that they improve the performance of symbolic rules and, in contrast to other hybrid neuro-symbolic approaches, retain the modularity of production rules and their naturalness in a large degree. In the improved approach, various types of indices are assigned to cases according to different roles they play in neurule-based reasoning, instead of one. Thus, an enhanced knowledge representation scheme is derived resulting in accuracy improvement. Experimental results demonstrate its effectiveness. 1 INTRODUCTION In contrast to rule-based systems that solve problems from scratch, case-based systems use pre-stored situations (i.e., cases) to deal with similar new situations. Case-based reasoning offers some advantages compared to symbolic rules and other knowledge representation formalisms. Cases represent specific knowledge of the domain, are natural and usually easy to obtain [ 11 ], [ 12 ]. Incremental learning comes natural to case-based reasoning. New cases can be inserted into a knowledge base without making changes to the preexisting knowledge. The more cases are available, the better the domain knowledge is represented. Therefore, the accuracy of a case-based system can be enhanced throughout its operation, as new cases become available. A negative aspect of cases compared to symbolic rules is that they do not provide concise representations of the incorporated knowledge. Also it is not possible to represent heuristic knowledge. Furthermore, the time-performance of the retrieval operations is not always the desirable.

Approaches integrating rule-based and case-based reasoning have given interesting and effective knowledge representation schemes and are becoming more and more popular in various fields [ 3 ], [ 13 ], [ 14 ], [ 15 ], [ 17 ], [ 18 ], [ 19 ]. The objective of these efforts is to derive hybrid representations that augment the positive aspects of the integrated formalisms and simultaneously minimize their negative aspects. The complementary advantages and disadvantages of rule-based and case-based reasoning are a good justification for their possible combination. The bulk of the approaches combining rule-based and case-based reasoning follow the coupling models [ 17 ]. In these models, the problem-solving (or reasoning) process is decomposed into tasks (or stages) for which different representation formalisms (i.e., rules or cases) are applied.

However, a more interesting approach is one integrating more than two reasoning methods towards the same objective.

In [ 16 ] and [ 10 ], such an approach integrating three reasoning schemes, namely rules, neurocomputing and case-based reasoning in an effective way is introduced. To this end, neurules and cases are combined. Neurules are a type of hybrid rules integrating symbolic rules with neurocomputing in a seamless way. Their main characteristic is that they retain the modularity of production rules and also their naturalness in a large degree. In that approach, on the one hand, cases are used as exceptions to neurules, filling their gaps in representing domain knowledge and, on the other hand, neurules perform indexing of the cases facilitating their retrieval. Finally, it results in accuracy improvement.

In this paper, we enhance the above approach by employing different types of indices for the cases according to different roles they play in neurule-based reasoning. In this way, an improved knowledge representation scheme is derived as various types of neurules’ gaps in representing domain knowledge are filled in by indexed cases. Experimental results demonstrate the effectiveness of the presented approach compared to our previous one.

The rest of the paper is organized as follows. Section 2 presents neurules, whereas Section 3 presents methods for constructing the indexing scheme of the case library. Section 4 describes the hybrid inference mechanism. Section 5 presents experimental results regarding accuracy of the inference process. Section 6 discusses related work. Finally, Section 7 concludes. 2

NEURULES Neurules are a type of hybrid rules integrating symbolic rules with neurocomputing giving pre-eminence to the symbolic component. Neurocomputing is used within the symbolic framework to improve the performance of symbolic rules [ 7 ], [ 10 ]. In contrast to other hybrid approaches (e.g. [ 4 ], [ 5 ]), the constructed knowledge base retains the modularity of production rules, since it consists of autonomous units (neurules), and also retains their naturalness in a large degree, since neurules look much like symbolic rules [ 7 ], [ 8 ]. Also, the inference mechanism is a tightly integrated process, which results in more efficient inferences than those of symbolic rules [ 7 ], [ 10 ]. Explanations in the form of if-then rules can be produced [ 9 ], [ 10 ]. 2.1 Syntax and Semantics The form of a neurule is depicted in Fig.1a. Each condition Ci is assigned a number sfi, called its significance factor. Moreover, each rule itself is assigned a number sf0, called its bias factor.

Internally, each neurule is considered as an adaline unit (Fig.1b). The inputs Ci (i=1,...,n) of the unit are the conditions of the rule. The weights of the unit are the significance factors of the neurule and its bias is the bias factor of the neurule. Each input takes a value from the following set of discrete values: [1 (true), 0 (false), 0.5 (unknown)]. This gives the opportunity to easily distinguish between the falsity and the absence of a condition in contrast to symbolic rules. The output D, which represents the conclusion (decision) of the rule, is calculated via the standard formulas:

D = f(a) ,

n a = sf 0 + ∑ sf i Ci

i=1 where a is the activation value and f(x) the activation function, a threshold function. Hence, the output can take one of two values (‘-1’, ‘1’) representing failure and success of the rule respectively.

Fig. 1. (a) Form of a neurule (b) a neurule as an adaline unit

The general syntax of a condition Ci and the conclusion D is: <condition>::= <variable> <l-predicate> <value> <conclusion>::= <variable> <r-predicate> <value> where <variable> denotes a variable, that is a symbol representing a concept in the domain, e.g. ‘sex’, ‘pain’ etc, in a medical domain. <l-predicate> denotes a symbolic or a numeric predicate. The symbolic predicates are {is, isnot} whereas the numeric predicates are {<, >, =}. <r-predicate> can only be a symbolic predicate. <value> denotes a value. It can be a symbol or a number. The significance factor of a condition represents the significance (weight) of the condition in drawing the conclusion(s). Table 1 (Section 3) presents two example neurules, from a medical diagnosis domain.

Neurules can be constructed either from symbolic rules, thus exploiting existing symbolic rule bases, or from empirical data (i.e., training examples) (see [ 7 ] and [ 8 ] respectively). An adaline unit is initially assigned to each possible conclusion.

Each unit is individually trained via the Least Mean Square (LMS) algorithm. When the training set is inseparable, special techniques are used. In that case, more than one neurule having the same conclusion are produced. The neurule-based inference engine performs a task of classification: based on the values of the condition variables and the weighted sums of the conditions, conclusions are reached. It gives pre-eminence to symbolic reasoning, based on a backward chaining strategy [ 7 ], [ 10 ]. As soon as the initial input data is given and put in the working memory, the output neurules are considered for evaluation. One of them is selected for evaluation. Selection is based on textual order. A neurule fires if the output of the corresponding adaline unit is computed to be ‘1’ after evaluation of its conditions. A neurule is said to be ‘blocked’ if the output of the corresponding adaline unit is computed to be ‘-1’ after evaluation of its conditions.

A condition evaluates to ‘true’ (‘1’), if it matches a fact in the working memory, that is there is a fact with the same variable, predicate and value. A condition evaluates to ‘unknown’, if there is a fact with the same variable, predicate and ‘unknown’ as its value. A condition cannot be evaluated if there is no fact in the working memory with the same variable.

In this case, either a question is made to the user to provide data for the variable, in case of an input variable, or an intermediate neurule with a conclusion containing the variable is examined, in case of an intermediate variable. A condition with an input variable evaluates to ‘false’ (‘0’), if there is a fact in the working memory with the same variable, predicate and different value. A condition with an intermediate variable evaluates to ‘false’ if additionally to the latter there is no unevaluated intermediate neurule that has a conclusion with the same variable. Inference stops either when one or more output neurules are fired (success) or there is no further action (failure).

During inference, a conclusion is rejected (or not drawn) when none of the neurules containing it fires. This happens when: (i) all neurules containing the conclusion have been examined and are blocked or/and (ii) a neurule containing an alternative conclusion for the specific variable fires instead. For instance, if all neurules containing the conclusion ‘disease-type is inflammation’ have been examined and are blocked, then this conclusion is rejected (or not drawn). If a neurule containing e.g. the alternative conclusion ‘disease-type is primarymalignant’ fires, then conclusion ‘disease-type is inflammation’ is rejected (or not drawn), no matter whether all neurules containing as conclusion ‘disease-type is inflammation’ have been examined (and are blocked) or not. 3 INDEXING Indexing concerns the organization of the available cases so that combined neurule-based and case-based reasoning can be performed. Indexed cases fill in gaps in the domain knowledge representation by neurules and during inference may assist in reaching the right conclusion. To be more specific, cases may enhance neurule-based reasoning to avoid reasoning errors by handling the following situations: (a) Examining whether a neurule misfires. If sufficient conditions of the neurule are satisfied so that it can fire, it should be examined whether the neurule misfires for the specific facts, thus producing an incorrect conclusion. (b) Examining whether a specific conclusion was erroneously

rejected (or not drawn).

In the approach in [ 10 ], the neurules contained in the neurule base were used to index cases representing their exceptions. A case constitutes an exception to a neurule if its attribute values satisfy sufficient conditions of the neurule (so that it can fire) but the neurule's conclusion contradicts the corresponding attribute value of the case. In this approach, various types of indices are assigned to cases. More specifically, indices are assigned to cases according to different roles they play in neurule-based reasoning and assist in filling in different types of gaps in the knowledge representation by neurules. Assigning different types of indices to cases can produce an effective approach combining symbolic rule-based with case-based reasoning [ 1 ].

In this new approach, a case may be indexed by neurules and by neurule base conclusions as well. In particular, a case may be indexed as: (a) False positive (FP), by a neurule whose conclusion is contradicting. Such cases, as in our previous approach, represent exceptions to neurules and may assist in avoiding neurule misfirings. (b) True positive (TP), by a neurule whose conclusion is endorsing. The attribute values of such a case satisfy sufficient conditions of the neurule (so that it can fire) and the neurule's conclusion agrees with the corresponding attribute value of the case. Such cases may assist in endorsing correct neurule firings. (c) False negative (FN), by a conclusion erroneously rejected (or not drawn) by neurules. Such cases may assist in reaching conclusions that ought to have been drawn by neurules (and were not drawn). If neurules with alternative conclusions containing this variable were fired instead, it may also assist in avoiding neurule misfirings. ‘False negative’ indices are associated with conclusions and not with specific neurules because there may be more than one neurule with the same conclusion in the neurule base.

The indexing process may take as input the following types of knowledge: (a) Available neurules and non-indexed cases. (b) Available symbolic rules and indexed cases. This type of knowledge concerns an available formalism of symbolic rules and indexed exception cases as the one presented in [ 6 ].

The availability of data determines which type of knowledge is provided as input to the indexing module. If an available formalism of symbolic rules and indexed cases is presented as input, the symbolic rules are converted to neurules using the ‘rules to neurules’ module. The produced neurules are associated with the exception cases of the corresponding symbolic rules [ 10 ]. Exception cases are indexed as ‘false positives’ by neurules. Furthermore, for each case ‘true positive’ and ‘false negative’ indices may be acquired using the same process as in type (a).

When available neurules and non-indexed cases are given as input to the indexing process, cases must be associated with neurules and neurule base conclusions. For each case, this information can be easily acquired as following:

Until all intermediate and output attribute values of the case have been considered: 1. Perform neurule-based reasoning for the neurules based on

the attribute values of the case. 2. If a neurule fires, check whether the value of its conclusion variable matches the corresponding attribute value of the case. If it does (doesn't), associate the case as a ‘true positive’ (‘false positive’) with this neurule. 3. Check all intermediate and final conclusions. Associate the case as a ‘false negative’ with each rejected (or not drawn) conclusion that ought to have been drawn based on the attribute values of the case.

To illustrate how the indexing process works, we present the following example. Suppose that we have a neurule base containing the two neurules in Table 1 and the example cases shown in Table 2 (only the most important attributes of the cases are shown). The cases however, also possess other attributes (not shown in Table 2).

‘disease-type’ is the output attribute that corresponds to the neurules’ conclusion variable. Table 3 shows the types of indices associated with each case in Table 2 at the end of the indexing process.

To acquire indexing information, the input values corresponding to the attribute values of the cases are presented to the example neurules. Recall that when a neurule condition evaluates to ‘true’ it gets the value ‘1’, whereas when it is false gets ‘0’.

For example, given the input case C2, the final weighted sum of neurule NR1 is: -23.9 + 10.6 + 10.5 + 8.8 = 6>0. Note that the first three conditions of NR1 evaluate to ‘true’ whereas the remaining four (i.e., ‘fever is medium’, ‘fever is no-fever’, ‘patient-class is human21-35’ and ‘ant-reaction is medium’) to ‘false’ (not contributing to the weighted sum). Case ID C1 C2 C3 C4 C5 C6 patient-class human21-35 human0-20 human0-20 human0-20 human21-35

pain continuous continuous

night continuous continuous human0-20 continuous The fact that the final weighted sum is positive means that sufficient conditions of NR1 are satisfied so that it can fire.

Furthermore, the corresponding output attribute value of the case matches the conclusion of NR1 and therefore C2 is associated as ‘true positive’ with NR1.

Similarly, when the input values corresponding to the attribute values of cases C1 and C4 are given as input to the neurule base, sufficient conditions of neurules NR2 and NR1 respectively are satisfied so that they can fire and the corresponding output attribute case values match their conclusions. Furthermore, when the input values corresponding to the attribute values of case C5 are given as input to the neurule base, sufficient conditions of both neurules NR1 and NR2 are satisfied so that they can fire. However, the corresponding output attribute case values match the conclusion of NR2 and contradict the conclusion of NR1. In addition, conclusion ‘disease-type is inflammation’ cannot be drawn when the input values corresponding to the attribute values of case C3 are given as input because the only neurule with the corresponding conclusion (i.e., NR1) is blocked. A similar situation happens for case C6. 4 THE HYBRID INFERENCE MECHANISM The inference mechanism combines neurule-based with casebased reasoning. The combined inference process mainly focuses on the neurules. The indexed cases are considered when: (a) sufficient conditions of a neurule are fulfilled so that it can fire, (b) all output or intermediate neurules with a specific conclusion variable are blocked and thus no final or intermediate conclusion containing this variable is drawn.

In case (a), firing of the neurule is suspended and case-based reasoning is performed for cases indexed as ‘false positives’ and ‘true positives’ by the neurule and cases indexed as ‘false negatives’ by alternative conclusions containing the neurule’s conclusion variable. Cases indexed as ‘true positives’ by the neurule endorse its firing whereas the other two sets of cases considered (i.e., ‘false positives’ and ‘false negatives’) prevent its firing. The results produced by case-based reasoning are evaluated in order to assess whether the neurule will fire or whether an alternative conclusion proposed by the retrieved case will be considered valid instead.

In case (b), the case-based module will focus on cases indexed as ‘false negatives’ by conclusions containing the specific (intermediate or output) variable.

The basic steps of the inference process are the following: 1. Perform neurule-based reasoning for the neurules. 2. If sufficient conditions of a neurule are fulfilled so that it can fire, then 2.1. Perform case-based reasoning for the ‘false positive’ and ‘true positive’ cases indexed by the neurule and the ‘false negative’ cases associated with alternative conclusions containing the neurule’s conclusion variable. 2.2. If none case is retrieved or the best matching case is indexed as ‘true positive’, the neurule fires and its conclusion is inserted into the working memory. 2.3. If the best matching case is indexed as ‘false positive’ or ‘false negative’, insert the conclusion supported by the case into the working memory and mark the neurule as 'blocked'. 3. If all intermediate neurules with a specific conclusion variable are blocked, then 3.1. Examine all cases indexed as ‘false negatives’ by the corresponding intermediate conclusions, retrieve the best matching one and insert the conclusion supported by the retrieved case into the working memory. 4. If all output neurules with a specific conclusion variable are blocked, then 4.1. Examine all cases indexed as ‘false negatives’ by the corresponding final conclusions, retrieve the best matching one and insert the conclusion supported by the retrieved case into the working memory.

The similarity measure between two cases ck and cl is calculated via a distance metric [ 1 ]. The best-matching case to the problem at hand is the one having the maximum similarity with (minimum distance from) the input case. If multiple stored cases have a similarity equal to the maximum one, a simple heuristic is used.

Let present now two simple inference examples concerning the combined neurule base (Table 1) and the indexed example cases (Tables 2 and 3). Suppose that during inference sufficient conditions of neurule NR1 are satisfied so that it can fire. Firing of NR1 is suspended and the case-based reasoning process focuses on the cases contained in the union of the following sets of indexed cases: • the set of cases indexed as ‘true positives’ by NR1:

{C2, C4}, • the set of cases indexed as ‘false positives’ by

NR1: {C5} and • the set of cases indexed as ‘false negatives’ by alternative conclusions containing variable ‘disease-type’ (i.e., ‘disease-type is chronic inflammation’): {C6}.

So, in this example the case-based reasoning process focuses on the following set of indexed cases: {C2, C4} ∪ {C5} ∪ {C6} = {C2, C4, C5, C6}.

Suppose now that during inference both output neurules in the example neurule base are blocked. The case-based reasoning process will focus on the cases contained in the union set of the following sets of indexed cases: • the set of cases indexed as ‘false negatives’ by

conclusion ‘disease-type is inflammation’: {C3}. • the set of cases indexed as ‘false negatives’ by conclusion ‘disease-type is chronic-inflammation’: {C6}.

Therefore, in this example the case-based reasoning process focuses on the following set of indexed cases: {C3} ∪ {C6} = {C3, C6}. 5 EXPERIMENTAL RESULTS In this section, we present experimental results using datasets acquired from [ 2 ]. Note that there are no intermediate conclusions in these datasets. The experimental results involve evaluation of the presented approach combining neurule-based and case-based reasoning and comparison with our previous approach [ 10 ]. 75% and 25% of each dataset were used as training and testing sets respectively. Each initial training set was used to create a combined neurule base and indexed case library. For this purpose, each initial training set was randomly split into two disjoint subsets, one used to create neurules and one used to create an indexed case library. More specifically, 2/3 of each initial training set was used to create neurules by employing the ‘patterns to neurules’ module [ 8 ] whereas the remaining 1/3 of each initial training set constituted nonindexed cases. Both types of knowledge (i.e., neurules and nonindexed cases) were given as input to the indexing construction module presented in this paper producing a combined neurule base and an indexed case library which will be referred to as NBRCBR. Neurules and non-indexed cases were also used to produce a combined neurule base and an indexed case library according to [ 10 ] which will be referred to as NBRCBR_PREV.

Inferences were run for both NBRCBR and NBRCBR_PREV using the testing sets. Inferences from NBRCBR_Prev were performed using the inference mechanism combining neurule-based and CBR as described in [ 10 ].

Inferences from NBRCBR were performed according to the inference mechanism described in this paper. No test case was stored in the case libraries.

Table 4 presents such experimental results regarding inferences from NBRCBR and NBRCBR_PREV. It presents results regarding classification accuracy of the integrated approaches and the percentage of test cases resulting in neurulebased reasoning errors that were successfully handled by casebased reasoning. Column ‘% FPs handled’ refers to the percentage of test cases resulting in neurule misfirings (i.e., ‘false positives’) that were successfully handled by case-based reasoning. Column ‘% FNs handled’ refers to the percentage of test cases resulting in having all output neurules blocked (i.e., ‘false negatives’) that were successfully handled by case-based reasoning. ‘False negative’ test cases are handled in NBRCBR_PREV by retrieving the best-matching case from the whole library of indexed cases.

Dataset

Car (1728 patterns) Nursery (12960 patterns)

As can be seen from the table, the presented approach results in improved classification accuracy. Furthermore, in inferences from NBRCBR the percentages of both ‘false positive’ and ‘false negative’ test cases successfully handled are greater than the corresponding percentages in inferences from NBRCBR_PREV. Results also show that there is still room for improvement.

We also tested a nearest neighbor approach working alone in these two datasets (75% of the dataset used as case library and 25% of the dataset used as testing set). We used the similarity measure presented in Section 5. The approach classified the input case to the conclusion supported by the best-matching case retrieved from the case library. Classification accuracy for car and nursery dataset is 90.45% and 96.67% respectively. So, both integrated approaches perform better. This is due to the fact that the indexing schemes assist in focusing on specific parts of the case library. In this paper, we present an approach integrating neurule-based and case-based reasoning that improves a previous hybrid approach [ 10 ]. Neurules are a type of hybrid rules integrating symbolic rules with neurocomputing. In contrast to other neurosymbolic approaches, neurules retain the naturalness and modularity of symbolic rules. Integration of neurules and cases is done in order to improve the accuracy of the inference mechanism. Cases are indexed according to the roles they can play during neurule-based inference. More specifically, they are associated as ‘true positives’ and ‘false positives’ with neurules and as ‘false negatives’ with neurule base conclusions.

The presented approach integrates three types of knowledge representation schemes: symbolic rules, neural networks and case-based reasoning. Most hybrid intelligent systems implemented in the past usually integrate two intelligent technologies e.g. neural networks and expert systems, neural and fuzzy logic, genetic algorithms and neural networks, etc. A new development that should receive interest in the future is the integration of more than two intelligent technologies, facilitating the solution of complex problems and exploiting multiple types of data sources.

Combinations of Case-Based Reasoning with Other Intelligent Methods

Jim Prentzas1 and Ioannis Hatzilygeroudis2 Abstract. Case-based reasoning is a popular approach used in intelligent systems. Whenever a new case has to be dealt with, the most similar cases are retrieved from the case base and their encompassed knowledge is exploited in the current situation.

Combinations of case-based reasoning with other intelligent methods have been explored deriving effective knowledge representation schemes. Although some types of combinations have been mostly explored, other types have not been thoroughly investigated. In this paper, we briefly outline popular case-based reasoning combinations. More specifically, we focus on combinations of case-based reasoning with rulebased reasoning, soft computing and ontologies. We illustrate basic types of such combinations and discuss future directions. 1 INTRODUCTION Case-based representations store a large set of previous cases with their solutions in the case base using them whenever a similar new case has to be dealt with [ 19 ], [ 22 ]. Whenever, a new input case comes in, a case-based system performs inference in four phases known as the case-based reasoning (CBR) cycle [ 1 ]: (i) retrieve, (ii) reuse, (iii) revise and (iv) retain. The retrieval phase retrieves from the case base the most relevant stored case(s) to the new case. Indexing schemes and similarity metrics are used for this purpose. In the reuse phase, a solution for the new case is created based on the retrieved most relevant case(s). The revise phase validates the correctness of the proposed solution, perhaps with the intervention of the user. Finally, the retain phase decides whether the knowledge learned from the solution of the new case is important enough to be incorporated into the system.

CBR can be effectively combined with other intelligent methods [ 25 ], [ 31 ]. Two main trends for CBR combinations can be discerned. The first trend involves embedded approaches in which the primary intelligent method (usually CBR) embeds one or more other intelligent methods to assist its internal online and offline tasks. The second combination trend involves approaches in which the problem solving process can be decomposed into tasks for which different representation formalisms are required or available. In such situations, a CBR system as a whole (with its possible internal modules) is integrated ‘externally’ with other intelligent systems to create an improved overall system.

Popular CBR combinations involve combinations with rulebased reasoning (RBR), model-based reasoning (MBR) and soft computing methods. CBR has also been combined with other intelligent methods (e.g. ontologies). In certain CBR combinations both combination trends have been followed. In other combinations one of the two trends is mostly explored.

In this paper, we briefly discuss aspects involving CBR combinations. We focus on intelligent methods with which CBR is usually combined. Our purpose is not to present an extensive survey of developed CBR combinations but to present their key aspects. 3 COMBINATIONS OF CBR Combinations of CBR with other intelligent methods have been explored for more effective knowledge representation and problem solving. CBR can be combined with various intelligent methods. However, CBR is usually combined with RBR, MBR and soft computing methods.

To categorize CBR combinations one could use Medsker’s general categorization scheme for integrated intelligent systems [ 26 ]. Medsker distinguishes five main combination models: standalone, transformational, loose coupling, tight coupling and fully integrated models. Distinction between those models is based on the degree of coupling between the integrated components. Underlying categories for some of these models are also defined. Main types of underlying categories for loose and tight coupling models involve pre-processing, postprocessing and co-processing models as well as embedded processing (for tight coupling models only). Not all of these combination models and/or their underlying categories have been thoroughly explored in the case of CBR combinations.

The types of combination models that have been applied to CBR combinations depend on the nature of the other intelligent methods combined with CBR. Some combination models are difficult to apply in certain CBR combinations. For instance, it is difficult to apply the fully integrated model in combinations of RBR with CBR. Obviously, the standalone model can be applied to combinations of CBR with any other method.

Generally speaking, coupling models are the most usual CBR combination models. More specifically, embedded coupling approaches constitute perhaps the most popular trend. Most of the combinations following this trend use other intelligent methods to assist various CBR tasks. CBR is a generic methodology for building knowledge-based systems and its internal reasoning tasks can be implemented using a number of techniques as long as the guiding CBR principles are followed [ 36 ]. The reverse approach that is, embedding casebased modules into intelligent systems employing other representations to assist in their internal tasks does not seem to be popular with the exception of combinations with genetic algorithms. In combinations of CBR with RBR and MBR, various coupling approaches have also been investigated besides embedded approaches [ 31 ]. In coupling combinations of CBR with soft computing methods, embedded approaches seem to be the most thoroughly investigated.

In the following, we discuss main issues involving combinations of CBR with RBR, fuzzy logic, neural networks, genetic algorithms and ontologies. 3.1 Combinations of CBR with RBR Various types of coupling models involving combinations of CBR and RBR have been investigated i.e., sequential processing, co-processing and embedded processing [ 31 ].

In sequential processing, information (produced by reasoning) necessarily passes sequentially through some or all of the combined modules to produce the final result [ 33 ], [ 11 ].

In co-processing approaches, the combined modules closely interact in producing the final result. Such systems can be discerned into two types: cooperation-oriented, which give emphasis on cooperation, and reconciliation-oriented, which give emphasis on reconciliation. In the former type, the combined components cooperate with each other (usually by interleaving their reasoning steps) [ 27 ], [ 32 ]. In the latter, each component produces its own conclusion, possibly differing from the conclusion of the other component, and thus a reconciliation process is necessary [ 14 ].

In embedded processing, CBR systems employ one or more RBR modules to perform tasks of their CBR cycle (e.g. retrieval and adaptation). Such approaches are quite common in CBR especially for adaptation. RBR systems embedding CBR modules do not seem to exist. 3.2 Combinations of CBR with Fuzzy Logic CBR can be combined with fuzzy logic in fruitful ways in order to handle imprecision. A usual approach is the incorporation of fuzzy logic into a CBR system in order to improve CBR aspects [ 4 ], [ 29 ], [ 35 ], [ 9 ]. Such combinations have been vastly explored as imprecision and uncertainty are inherent in various CBR tasks. Fuzzy terms may be used in case representation enabling a flexible encoding of case features that encompasses imprecise and uncertain information. Fuzzy logic may be also proved very useful in indexing and retrieval. Fuzzy indexing enables multiple indexing of a case on a single feature with different degrees of membership [ 35 ]. Fuzzy similarity assessment and matching methods can produce more accurate results. Fuzzy clustering and classification methods can also be applied in case retrieval. In addition, fuzzy adaptation rules can be employed in case adaptation.

The works concerning combination of RBR with CBR [ 31 ] could potentially be improved with use of fuzzy rules.

Investigation of coupling approaches in combinations of CBR with fuzzy systems besides embedded ones could be fruitful. Neural networks are usually employed by CBR to perform tasks such as indexing, retrieval and adaptation. In this way, appealing characteristics of neural networks such as parallelism, robustness, adaptability, generalization and ability to cope with incomplete input data are exploited [ 10 ], [ 35 ]. Due to the fact that different types of neural networks have been developed (e.g. back propagation neural networks, radial basis function networks, Self-Organizing Map networks, ART network), different types of neural capabilities for classification and clustering can be exploited. Certain CBR approaches have employed different types of neural networks for the various internal CBR tasks (e.g. [ 12 ], [ 34 ]). Knowledge extracted from neural networks could also be exploited by CBR [ 10 ], [ 35 ]. An interesting direction could involve non-embedded coupling approaches combining CBR with neural networks. Usual combinations of CBR with genetic algorithms (GAs) involve use of GAs to optimize (one or more) aspects of a CBR system. On the other hand, CBR can be exploited to enhance GAs. Other types of combinations of CBR with GAs can be also implemented.

GAs can be used within CBR to enhance indexing and retrieval. GAs have been used to assign case feature weights enhancing similarity assessment [ 39 ], [ 8 ], to perform feature selection [ 18 ] and generally to select relevant indices for evolving environments. GAs have also been used to retrieve multiple similar cases [ 38 ]. If k nearest neighbor retrieval is applied, genetic algorithms can be used to find the optimal k parameter in order to improve the retrieval accuracy [ 2 ].

Furthermore, GAs can be used to perform instance selection i.e., finding the representative cases in a case base and determining a reduced subset of a case base. In this way, time performance is improved by reducing search space and accuracy can be improved through elimination of noisy and useless cases [ 2 ].

Additionally, GAs have been used to enhance case adaptation [ 16 ], [ 17 ]. Genetic algorithms can also optimize case representation, e.g. by performing case feature discretization [ 18 ] and removing irrelevant features. Such optimizations improve accuracy, search time and storage requirements. It is also quite usual to simultaneously optimize more than one CBR aspect with GAs (e.g. [ 2 ], [ 18 ]).

On the other hand CBR can be employed to enhance GAs.

CBR can be applied to GAs by creating cases to track the history of a search. This case base can contribute in the understanding of how a solution was reached, why a solution works, and what the search space looks like. It could thus be used to design highly tailored search strategies for future use [ 23 ]. Such an approach could therefore be used to explain the results of the genetic algorithm and for knowledge extraction.

Moreover, similar stored cases can be also incorporated into a genetic algorithm to reduce convergence time and improve solution accuracy. GAs randomly initialize their starting population. Instead, relevant stored cases can be used as part of the initial population (solution) of GAs. Additionally, relevant stored cases can be periodically injected into the pool of chromosomes while the genetic algorithm runs [ 24 ], [ 7 ]. In certain approaches, CBR is exploited by GAs for both knowledge extraction and case injection [ 30 ]. 3.5 Combinations of CBR with Ontologies Ontologies facilitate knowledge sharing and reuse. They can provide an explicit conceptualization describing data semantics and a shared and common understanding of the domain knowledge that can be communicated among agents and application systems [ 6 ]. Ontologies play a crucial role in enabling the processing and sharing of knowledge between programs on the Web [ 21 ]. Intelligent Decision Support Systems in the semantic Web framework should be able to handle, integrate with and reason from distributed data and information on the Web [ 3 ].

Therefore ontologies can be combined with CBR in various ways. Ontologies can be used by a CBR system to represent the input problem [ 20 ], to enhance similarity assessment [ 13 ], case representation, case abstraction and case adaptation [ 3 ].

Ontologies may perform all such CBR tasks [ 37 ]. 3.6 Combinations of CBR with Multiple Intelligent Methods The previous sections focused on combinations of CBR with one other individual intelligent method. However, intelligent systems have been developed that combine CBR with multiple other intelligent methods. Such multi-integrated paradigms usually follow a coupling model.

Obviously, a CBR system may employ multiple intelligent methods (e.g. rules and various soft computing methods) to perform its internal tasks [ 36 ]. Typical examples of approaches employing multiple soft computing methods within the CBR cycle are presented in [ 12 ] and [ 34 ]. In [ 12 ] all of the four phases of the CBR cycle employ soft computing methods.

Employed soft computing methods are a self-organizing neural network for retrieval, a radial basis neural network for reuse, fuzzy systems for revise and all soft computing methods for retain. In [ 34 ] fuzzy logic, supervised and unsupervised neural networks and a genetic algorithm are employed for case representation, indexing, retrieval and adaptation.

More interesting approaches concern multi-integrated systems not following the embedded approach. Typical such multi-integrated approaches involve combinations of CBR, RBR and MBR (e.g. [ 28 ]). Such approaches seem to be quite effective, because combinations of CBR with RBR and MBR individually have been thoroughly investigated. Quite often such systems have been implemented to deal with deficiencies of earlier systems combining CBR with only one of the other two intelligent methods (e.g. RBR or MBR alone). Multiintegrated CBR approaches, besides those involving RBR/MBR, could be developed. For instance, ontologies could constitute an interesting candidate method that could be combined with CBR and another intelligent method in order to facilitate knowledge sharing and reuse among the integrated system components themselves [ 5 ] and among integrated systems. Such a combination could be useful in Web-based systems that need to share knowledge. Fruitful such approaches could involve combinations of CBR, ontologies and RBR/MBR. For instance in [ 6 ] an approach combining CBR, RBR and an ontology is presented.

Multi-integrated paradigms could also be considered systems combining CBR with certain types of neuro-symbolic or neurofuzzy approaches in which the neuro-symbolic (neuro-fuzzy) module fully integrates the neural and symbolic (fuzzy) approach. Such modules could be used within CBR instead of plain neural or fuzzy components. Non-embedded coupling approaches can be applied as well. For instance, in [ 15 ] a neuro-symbolic method is combined with CBR according to the reconciliation coupling approach. In this paper, we discuss key aspects involving combinations of CBR with other intelligent methods. Such combinations are becoming increasingly popular due to the fact that in many application domains a vast amount of case data is available.

Such combined approaches have managed to solve problems in application domains where a case-based module needs the assistance and/or completion of other intelligent modules in order to produce effective results. This trend is very likely to carry on in the following years.

Future directions in combinations of CBR with other intelligent methods could involve a number of aspects. Main such aspects involve: (a) combinations of CBR with soft computing methods, (b) combinations of CBR with fuzzy rules, (c) combinations of CBR with ontologies and (d) combinations of CBR with neuro-symbolic and neuro-fuzzy approaches.

Combinations of CBR with soft computing methods not following an embedded coupling approach could be an interesting future research direction. At present there seems to be a lack of great interest in pursuing this direction since the main interest has been focused on employing soft computing methods within CBR. A non-embedded direction in the combinations of CBR with soft computing could be pursued as thoroughly as in the case of combinations of CBR with RBR/MBR. A further step towards this direction could involve non-embedded approaches combining CBR with multiple soft computing methods or combinations of CBR, soft computing and other intelligent methods (e.g. RBR, MBR or ontologies).

Combinations of CBR with fuzzy rule-based systems could be based on work combining CBR with RBR that is, investigation of various coupling approaches.

The increasing interest in Web-based intelligent systems and future advances in the Semantic Web is likely to provide an impetus to approaches combining CBR with ontologies. This trend is likely to involve multi-integrated approaches combining CBR, ontologies and other intelligent methods.

Finally, a direction that may be useful to pursue involves non-embedded coupling approaches combining CBR with neuro-symbolic and neuro-fuzzy modules. Few such approaches have been developed.

REFERENCES

Nikolaos Spanoudakis1 and Konstantina Pendaraki2 and Grigorios Beligiannis2 Abstract. In this paper we present an application for the construction of mutual fund portfolios. It is based on a combination of Intelligent Methods, namely an argumentation based decision making framework and a forecasting algorithm combining Genetic Algorithms (GA), MultiModel Partitioning (MMP) theory and Extended Kalman Filters (EKF). The argumentation framework is employed in order to develop mutual funds performance models and to select a small set of mutual funds, which will compose the final portfolio. The forecasting algorithm is employed in order to forecast the market status (inflating or deflating) for the next investment period. The knowledge engineering approach and application development steps are also discussed.12 1 INTRODUCTION Portfolio management [ 8 ] is concerned with constructing a portfolio of securities (e.g., stock, bonds, mutual funds [ 13 ], etc.) that maximizes the investor’s utility. In a previous study [ 14 ], we constructed mutual fund (MF) portfolios using an argumentation based decision making framework. We developed rules that characterize the market and different investor types policies using evaluation criteria of fund performance and risk. We also defined strategies for resolving conflicts over these rules. Furthermore, the developed application can be used for a set of different investment policy scenarios and supports the investor/portfolio manager in composing efficient MF portfolios that meet his investment preferences. The traditional portfolio theories ([ 8 ], [ 11 ], [ 12 ]) were based on unidimensional approaches that did not fit to the multidimensional nature of risk ([ 3 ]), and they did not capture the complexity presented in the data set. In [ 14 ], this troublesome situation was resolved by the high level of adaptability in the decisions of the portfolio manager or investor when his environment is changing and the characteristics of the funds are multidimensional that was demonstrated by the use of argumentation.

Our study showed that when taking into account the market context, the results were better if we could forecast the status of the market of the following investment period. In order to achieve this goal we employed a hybrid system that combines Genetic Algorithms (GA), MultiModel Partitioning (MMP) theory and the Extended Kalman Filter (EKF). A general description of this algorithm and its application in linear and non-linear data is discussed in [ 2 ], while the specific version used in this contribution is presented in [ 1 ], where its successful application to non-linear data is also presented. This algorithm captured our attention because it had been successfully used in the past for 1 Technical University of Crete, Greece, email: nikos@science.tuc.gr 2 University of Ioannina, Greece, email: {dpendara, gbeligia}@cc.uoi.gr accurately predicting the evolution of stock values in the Greek market (its application on economic data is presented in [ 2 ]).

Moreover, there is a lot of work on hybrid evolutionary algorithms and their application on many difficult problems has shown very promising results [ 4 ]. The problem of predicting the behavior of the financial market is an open problem and many solutions have been proposed. However, there isn't any known algorithm able to identify effectively all kinds of behaviors. Also, many traditional methods have been applied to the same problem and the results obtained were not very satisfactory. There are two main difficulties in this problem, firstly the search space is huge and, secondly, it comprises of many local optima.

In this contribution, we present the whole application resulting from the combination of argumentation with hybrid evolutionary systems along with the respective results.

The rest of the paper is organized as follows: Section two presents an overview of the concepts and application domain knowledge. Section three outlines the main features of the proposed argumentation based decision-making framework and the developed argumentation theory. The forecasting hybrid evolutionary system is presented in section four, followed by section five, which presents the developed application and discusses the obtained empirical results. Finally, section six summarizes the main findings of this research. 2 DOMAIN KNOWLEDGE

This section describes the criteria (or variables) used for creating portfolios and the knowledge on how to use these criteria in order to construct a portfolio.

The data used in this study is provided from the Association of Greek Institutional Investors and consists of daily data of domestic equity mutual funds (MFs) over the period January 2000 to December 2005.

The proposed framework is based on five fundamental variables. The return of the funds is the actual value of return of an investment defined by the difference between the nominal return and the rate of inflation. This variable is based on the net price of a fund. At this point, it is very important to mention that transaction costs such as management commission are included in the net price. Frond-end commission and redemption commission fluctuate depending on the MF class and in most cases are very low. The standard deviation is used to measure the variability of the fund’s daily returns, thus representing the total risk of the fund. The beta coefficient (β) is a measure of fund’s risk in relation to the capital risk. The Sharpe index [ 13 ] is a useful measure of performance and is used to measure the expected return of a fund per unit of risk, defined by the standard deviation.

The Treynor index [ 15 ] is similar to the Sharpe index except that performance is measured as the risk premium per unit of systematic (beta coefficient) and not of total risk.

On the basis of the argumentation framework for the selection of a small set of MF, which will compose the final multiportfolios, the examined funds are clustered in three groups for each criterion for each year. For example, we have funds with high, medium and low performance (return), the same for the other criteria.

The aforementioned performance and risk variables visualize the characteristics of the capital market (bull or bear) and the type of the investor according to his investment policy (aggressive or moderate). Further information is represented through variables that describe the general conditions of the market and the investor policy (selection of portfolios with high performance per unit of risk).

The general conditions of the market are characterized through the development of funds which have high performance levels (high return). Regarding the market context, in a bull market, funds are selected if they have high systematic or total risk. On the other hand, in a bear market, we select funds with low systematic and total risk. An aggressive investor is placing his capital upon funds with high performance and high systematic risk.

Accordingly, a moderate investor selects funds with high performance and low or medium systematic risk. Some types of investors select portfolios with high performance per unit of risk.

Such portfolios are characterized by high Sharpe ratio and high Treynor ratio. In this section we firstly present the argumentation framework that we used and then we describe the domain knowledge modeling based on the argumentation framework. 3.1 Autonomous agents, be they artificial or human, need to make decisions under complex preference policies that take into account different factors. In general, these policies have a dynamic nature and are influenced by the particular state of the environment in which the agent finds himself. The agent's decision process needs to be able to synthesize together different aspects of his preference policy and to adapt to new input from the current environment.

Such agents are the mutual fund managers.

In order to address requirements like the above, Kakas and Moraitis ([ 6 ]) proposed an argumentation based framework to support an agent's self deliberation process for drawing conclusions under a given policy.

Argumentation can be abstractly defined as the principled interaction of different, potentially conflicting arguments, for the sake of arriving at a consistent conclusion (see e.g. [ 10 ]). The nature of the “conclusion” can be anything, ranging from a proposition to believe, to a goal to try to achieve, to a value to try to promote. Perhaps the most crucial aspect of argumentation is the interaction between arguments. This means that argumentation can give us means for allowing an agent to reconcile conflicting information within itself, for reconciling its informational state with new perceptions from the environment, and for reconciling conflicting information between multiple agents through communication. A single agent may use argumentation techniques to perform its individual reasoning because it needs to make decisions under complex preferences policies, in a highly dynamic environment (see e.g. [ 6 ]). This is the case used in this research.

In the following paragraphs we describe the theoretical framework that we adopted:

Definition 1. A theory is a pair (T, P) whose sentences are formulae in the background monotonic logic (L, ⊢ ) of the form L←L1,…,Ln, where L, L1, …, Ln are positive or negative ground literals. For rules in P the head L refers to an (irreflexive) higher priority relation, i.e. L has the general form L = h_p(rule1, rule2).

The derivability relation, ⊢ , of the background logic is given by the simple inference rule of modus ponens.

An argument for a literal L in a theory (T, P) is any subset, T, of this theory that derives L, T ⊢ L, under the background logic. A part of the theory T0 ⊂ T, is the background theory that is considered as a non defeasible part (the indisputable facts).

An argument attacks (or is a counter argument to) another when they derive a contrary conclusion. These are conflicting arguments. A conflicting argument (from T) is admissible if it counter-attacks all the arguments that attack it. It counter-attacks an argument if it takes along priority arguments (from P) and makes itself at least as strong as the counter-argument (we omit the relevant definitions from [ 6 ] due to limited space).

Definition 2. An agent’s argumentative policy theory is a theory T = ((T, T0), PR, PC) where T contains the argument rules in the form of definite Horn logic rules, PR contains priority rules which are also definite Horn rules with head h_p(r1, r2) s.t. r1, r2 ∈ T and all rules in PC are also priority rules with head h_p(R1, R2) s.t. R1, R2 ∈ PR ∪ PC. T0 contains auxiliary rules of the agent’s background knowledge.

Thus, in defining the decision maker’s theory we specify three levels. The first level (T) defines the (background theory) rules that refer directly to the subject domain, called the Object-level Decision Rules. In the second level we have the rules that define priorities over the first level rules for each role that the agent can assume or context that he can be in (including a default context).

Finally, the third level rules define priorities over the rules of the previous level (which context is more important) but also over the rules of this level in order to define specific contexts, where priorities change again. 3.2 Theory

The

Decision

Maker’s

Argumentation Using the presented argumentation framework, we transformed the criteria for all MFs and experts knowledge (§2) to background theory (facts) and rules of the first and second level. Then, we defined the strategies (or specific contexts) in the third level rules.

The goal of the knowledge base is to select some MFs in order to construct our portfolio. Therefore our rules have as their head the predicate selectFund/1 and its negation. We write rules supporting it or its negation and use argumentation for resolving conflicts. We introduce the hasInvestPolicy/2, preference/1 and market/1 predicates for defining the different contexts and roles.

For example, John, an aggressive investor is expressed with the predicate hasInvestPolicy(john, aggressive).

The knowledge base facts are the performance and risk variables values for each MF, the thresholds for each group of values for each year and the above mentioned predicates characterizing the investor and the market. The following rules are an example of the object-level rules (level 1 rules of the framework - T): r1(Fund): selectFund(Fund) ← highR(Fund) r2(Fund): ¬selectFund(Fund) ← highB(Fund)

The highR predicate denotes the classification of the MF as a high return fund and the highB predicate denotes the classification of the MF as a high risk fund. Thus, the r1 rule states that a high performance fund should be selected, while the r2 rule states that a high risk fund should not be selected. Such rules are created for the three groups of our performance and risk criteria.

Then, in the second level we assign priorities over the object level rules. The PR are the default context rules or level 2 rules.

These rules are added by experts and express their preferences in the form of priorities between the object level rules that should take place within defined contexts and roles. For example, the level 1 rules with signatures r1 and r2 are conflicting. In the default context the first one has priority, while the bear market context reverses this priority: R1: h_p(r1(Fund),r2(Fund)) ← true R2: h_p(r2(Fund),r1(Fund)) ← market(bear)

Rule R1 defines the priorities set for the default context, i.e. an investor selects a fund that has high return on investment (RoI) even if it has high risk. Rule R2 defines the default context for the bear market context (within which, the fund selection process is cautious and does not select a high RoI fund if it has high risk).

Finally, in PC (level 3 rules) the decision maker defines his strategy and policy for integrating the different roles and contexts rules. When combining the Aggressive investor role and bear market context, for example, the final portfolio is their union except that the aggressive investor now would accept to select high and medium risk MFs (instead of only high). The decision maker’s strategy sets preference rules between the rules of the previous level but also between rules at this level. Relating to the level 2 priorities, the bear market context’s priority of not buying a high risk MF, even if it has a high return, is set at higher priority than that of the general context. Then, the specific context of an aggressive investor in a bear market defines that the bear market context preference is inverted. See the relevant priority rules: C1: h_p(R2, R1) ← true C2: h_p(R1, R2) ← hasInvestPolicy(Investor, aggressive).

C3: h_p(C2, C1) ← true

Thus, an aggressive investor in a bear market context would continue selecting high risk funds. In the latter case, the argument r1 takes along the priority arguments R1, C2 and C3 and becomes stronger (is the only admissible one) than the conflicting r2 argument that can only take along the R2 and C1 priority arguments. Thus, the selectFund(Fund) predicate is true and the fund is inserted in the portfolio.

The problem with the above rules is that the facts market(bear) or (exclusive) market(bull) could not be safely determined for the next investment period. In the application version presented in [ 14 ] it was just assumed to remain the same as at the time of the investment. This strategy, however produced quite poor results for this context if it should change in the next period. 4 FORECASTING THE STATUS OF THE FINANCIAL MARKET One of the most prominent issues in the field of signal processing is the adaptive filtering problem, with unknown time-invariant or time-varying parameters. Selecting the correct order and estimating the parameters of a system model is a fundamental issue in linear and nonlinear prediction and system identification.

The problem of fitting an AutoRegressive Moving Aaverage model with eXogenous input (ARMAX) or a Nonlinear AutoRegressive Moving Aaverage model with eXogenous input (NARMAX) to a given time series has attracted much attention because it arises in a large variety of applications, such as time series prediction in economic and biomedical data, adaptive control, speech analysis and synthesis, neural networks, radar and sonar, fuzzy systems, and wavelets [ 5 ].

The forecasting algorithm used in this contribution is a generic applied evolutionary hybrid technique, which combines the effectiveness of adaptive multimodel partitioning filters and GAs’ robustness [ 1 ]. This method has been first presented in [ 7 ].

Specifically, the a posteriori probability that a specific model, of a bank of the conditional models, is the true model, can be used as fitness function for the GA. In this way, the algorithm identifies the true model even in the case where it is not included in the filters’ bank. It is clear that the filter’s performance is considerably improved through the evolution of the population of the filters’ bank, since the algorithm can search the whole parameter space. The proposed hybrid evolutionary algorithm can be applied to linear and nonlinear data; is not restricted to the Gaussian case; does not require any knowledge of the model switching law; is practically implementable, computationally efficient and applicable to online/adaptive operation; and exhibits very satisfactory performance as indicated by simulation experiments [ 2 ]. The structure of the hybrid evolutionary system used is depicted in Figure 1.

The representation used for the genomes of the population of the GA is the following. We use a mapping that transforms a fixed dimensional internal representation to variable dimensional problem instances. Each genome consists of a vector x of real values xi∈ ℜ , i = 1, ..., k, and a bit string b of binary digits bi∈{0,1}, i = 1, ..., k. Real values are summed up as long as the corresponding bits are equal. Obviously, k is an upper bound for the dimension of the resulting parameter vector. We use the first k/3 real values for the autoreggressive part, the second k/3 real values for the moving average part, and the last k/3 real values for the exogenous input part. An example of this mapping is presented in Figure 2. For a more detailed description of this mapping refer to [ 2 ].

At first, an initial population of m genomes is created at random (each genome consists of a vector of real values and a bit string). As stated before, each vector of real values represents a possible value of the NARMAX model order and its parameters.

For each such population we apply an MMAF with EKFs and have as result the model-conditional probability density function (pdf) of each candidate model. This pdf is the fitness of each candidate model, namely the fitness of each genome of the population (Figure 3).

(one) which is the maximum value it is able to have as a probability For a more detailed description of this hybrid evolutionary system refer to [ 2 ].

The reproduction operator we decided to use is the classic biased roulette wheel selection according to the fitness function value of each possible model order [ 9 ]. As far as crossover is concerned, we use the one-point crossover operator for the binary strings and the uniform crossover operator for the real values [ 9 ].

Finally, we use the flip mutation operator for the binary strings and the Gaussian mutation operator for the real values [ 9 ]. Every new generation of possible solutions iterates the same process as the old ones and all this process may be repeated as many generations as we desire or till the fitness function has value 1

In this contribution we apply a slightly different approach compared to the one presented in [ 2 ]. In [ 2 ], at the algorithm’s step where the value of the estimation (output) x of each filter is calculated, the past values of x that are used in order to estimate the next value of x are always taken from the estimation file (the file of all past values of x that have been estimated by the algorithm till this point). All these values are used in each generation in order to estimate the next value of the estimation (output) vector x. The method presented in this contribution uses a different approach in order to estimate x. At the algorithm’s step where the value of x for each filter is calculated, the past values of x that are used in order to estimate the next value of x are smaller than the total length of the time series that has been estimated till this point. The length of past values used in each generation in order to estimate the next value of x equals to n/2, where n is the total length of the time series to be estimated. Every new value of x, estimated by the algorithm, is added to this time series of length n/2 and the oldest one is removed in order this time series to sustain a length of n/2. The value of n/2 was not selected arbitrarily. We have conducted exhaustive experiments using many different values. The value of n/2, that has been finally selected, was the most effective one, that is, the one that resulted in the best prediction results.

Thus, the hybrid evolutionary system presented in Figure 1 is used in order to forecast the behavior of the financial market in relation to its current status. The market is characterized as bull market if it is forecasted to rise in the next semester, or as bear market if it is forecasted to fall. We used the return values of the Greek market index for each semester starting from year 1985 to the years of our sample data (2000 to 2005). The algorithm performed very well considering that it could forecast the next semester market behavior with a success rate of 85.17% (12 out of 14 right predictions). 5 THE PORTFOLIO CONSTRUCTION APPLICATION In this section we firstly present the system architecture, i.e. the combination method for the argumentation decision making subsystem and the hybrid forecasting sub-system that resulted in a coherent application. Then we present the results of this combination.

to different investment choices and leads to the selection of different number and combinations of MFs.

System Architecture The portfolio generation application is a Java program creating a human-machine interface and managing its modules, namely the decision making module, which is a prolog rule base (executed in SWI-prolog1) using the Gorgias2 framework, and the forecasting module, which is a Matlab3 implementation of the forecasting hybrid system (see Figure 4).

The application connects to the SWI-Prolog module using the provided Java interface (JPL) that allows for inserting facts to an existing rule-base and running it for reaching goals. The goals can be captured and returned to the Java program. The application connects to Matlab by executing it in a system shell. The matlab program writes the results of the algorithm to a MySQL4 database using SQL (Structured Query Language). The application first executes the forecasting module, then updates the database, using JDBC (Java DataBase Connectivity interface) technology, with the investor profile (selected roles) and, finally, queries the decision making module setting as goal the funds to select for participation in the final portfolio. Thus, after the execution of the forecasting module the predicate market/1 is determined as bull or bear and inserted as a fact in the rule base before the decision making process is launched. The reader can see in Figure 5 a screenshot of the integrated system. For evaluating our results we defined scenarios for all years for which we had available data (2000-2005) and for all combinations of contexts. That resulted to the two investor types combined with the market status, plus the two investor types combined with the high performance option, plus the market status combined with the high performance option, all together five different scenarios run for six years each. Each one of the examined scenarios refers 1 SWI-Prolog offers a comprehensive Free Software Prolog environment,

http://www.swi-prolog.org 2 Gorgias is an open source general argumentation framework that combines the ideas of preference reasoning and abduction, http://www.cs.ucy.ac.cy/~nkd/gorgias/ 3 MATLAB® is a high-level language and interactive environment for performing computationally intensive tasks, http://www.mathworks.

com/products/matlab 4 MySQL is an open source database, http://www.mysql.com

Figure 5: A screenshot for portfolio generation for a scenario

of a moderate investor in a bull market context

In Table 1 the reader can inspect the average return on investment (RoI) for the six years for all different contexts. The reader should notice that the table contains two RoI columns, the first (“Previous RoI”) depicts the results before changing the system as they appeared in [ 14 ]. The second presents the results of upgrading the application by combining it with the hybrid evolutionary forecasting sub-system and by fixing the selected funds participation to the final portfolio. The latter modification is out of the scope of this paper but the reader can clearly see that it has greatly influenced the performance of all scenarios.

Table 1, however, shows the added value of this contribution as the market context has become the most profitable in the “New RoI” column (8.17% RoI), while in the “Previous RoI” column it was one of the worst cases (3.72% RoI). Consequently the specific contexts containing the market context have better results. 7.16 7.92 6.08 7.46 7.16 7.23

Moreover, Table 1 also shows the added value of our approach as the reader can compare our results with the return on investment (RASE) of the General Index of the Athens Stock Exchange (ASE-GI). According to the results of this table, the average return of the constructed portfolios for all contexts, except two, achieves higher return than the market index. The two cases where the constructed portfolios did not beat the market index are the moderate simple context and moderate-market specific context. This is, maybe, due to the fact that in these two contexts we have an investor who wishes to earn more without taking into account any amount of risk in relation to the variability which characterizes the conditions of the market during the examined period. This fact makes it very difficult to implement investment strategies that can help a fund manager outperform a passive investment policy.

Furthermore, we notice that in some specific contexts the results are more satisfying than the results obtained by simple contexts, while in others there is little or no difference. This means that by using effective strategies in the third preference rules layer the decision maker can optimize the combined contexts. Specifically, the aggressive-high performance specific context provides better results than both the simple contexts aggressive and high performance (the ones that it combines) and the general context. The moderate-high performance specific context’s returns on investment are equal to the higher simple context’s returns (high performance) while the aggressive-market specific context returns are closer to the higher simple context’s returns (market).

Finally, in Figure 6, we present the RoI of all contexts separately for each year. This view is also useful, as it shows that for two years, 2003 and 2004, RASE was greater than all our contexts RoI performance. This shows that our application, for the time being, performs better for medium term to long term investments, i.e. those that range over five years. The objective of this paper was to present an artificial intelligence based application for the MF portfolio generation problem that combines two different intelligent methods, argumentation based decision making and a hybrid system that combines Genetic Algorithms (GA), MultiModel Partitioning (MMP) theory and the Extended Kalman Filter (EKF).

We described in detail how we developed our argumentation theory and how we combined it with the hybrid system to determine an important fact for the decision making process, i.e. the status of the financial market in the next investment period.

The developed application allows a decision maker (fund manager) to construct multi-portfolios of MFs under different, possibly conflicting contexts. Moreover, for medium to long term investments, the returns on investment of the constructed portfolios are better than those of the General Index of the Athens Stock Exchange, while the best results are those that involve the forecasting of the financial market.

Our future work will be to develop a new rule base for the problem of determining when to construct a new portfolio for a specific investor. We will also make the application web-based so that it can get on-line financial data available from the internet for computing the decision variables and for allowing the investors to insert their profiles by filling on-line forms. Finally, we will continue evaluating our application as new data become available for years after 2005. Our aim is to be able to guarantee a better RoI than that of the ASE.

Elena I Teodorescu and Miltos Petridis1 Abstract. This paper presents an investigation into applying Case-Based Reasoning to Multiple Heterogeneous Case Bases using agents. The adaptive CBR process and the architecture of the system are presented. A case study is presented to illustrate and evaluate the approach. The process of creating and maintaining the dynamic data structures is discussed. The similarity metrics employed by the system are used to support the process of optimisation of the collaboration between the agents which is based on the use of a blackboard architecture. The blackboard architecture is shown to support the efficient collaboration between the agents to achieve an efficient overall CBR solution, while using case-based reasoning methods to allow the overall system to adapt and “learn” new collaborative strategies for achieving the aims of the overall CBR problem solving process. 1 Introduction1 Case-based reasoning (CBR) is now an established artificial intelligence paradigm. Given a case-base of prior experiences, a CBR system solves new problems by retrieving cases from the casebase, and adapting their solutions to comply the new requirements[ 1 ].

Multiple Case Based Reasoning (MCBR) is used to retrieve solutions for a new problem from more than one case-base. Methods for managing sharing of standardized case bases have been studied in research on distributed CBR (e.g. [ 13 ]), as have methods for facilitating large-scale case distribution [ 10 ]. Leake and Sooriamuthhi propose a new strategy for MCBR - an agent selectively supplements its own case-base as needed, by dispatching problems to external case-bases and using cross-case-base adaptation to adjust their solutions for inter-case-base differences [ 4, 5, 6,13 ].

In many problems in modern organisations, the knowledge encapsulated by cases is contained in multiple case bases reflecting the fragmented way with which organisations capture and organise knowledge. The traditional approach is to merge all case bases into a central case base that can be used for the CBR process. However, this approach brings with it three challenges: • Moving cases into a central case base potentially separates from its context and makes maintenance more difficult. • Various case bases can use different semantics. There is therefore a need to maintain various ontologies and mappings across the case bases. • The knowledge content “value” of individual cases can be related to its origination. This can be lost when merging into a central case base.

Keeping the cases distributed in the form of a Heterogeneous Multiple Case Based Reasoning system (HMCBR) may have a number of advantages such as increased maintainability and com

1 Department of Computing Science, University of Greenwich, Park Row, London SE10 9LS email:{ E.I.Teodorescu , M.Petridis}@gre.ac.uk petence and the contextualisation of the cases. Past research at Greenwich [ 2 ][ 3 ] has shown the need to combine knowledge encoded in cases from various heterogeneous sources to achieve a competent, seamless CBR system.

Ontanon and Plaza [ 7 ] looked at a way to “improve the overall performance of the multiple case systems and of the individual CBR agents without compromising the agent’s autonomy”. They present [ 8 ] a framework for collaboration among agents that use CBR and strategies for case bartering (case trading by CBR agents). Nevertheless, they do not focus at the possibility of cases having different structures and what impact this will have on applying CBR to heterogeneous case bases. Leake [ 5 ] states that “An important issue beyond the scope of this paper is how to establish correspondences between case representations, if the representations used by different case-bases differ.”

Given several case bases as the search domain, it is very likely that they have different structures. Ideally, accessing Multiple Case Bases should not require a change to their data structures. In order for an MCBR system to effectively use case-bases that may have been developed in different ways, for different tasks or task environments, methods are needed to adjust retrieved cases for local needs.

Leake and Sooriamurthi [ 4 ] proposed a theoretical “cross-casebase adaptation” which would adapt suggested solutions from one case base to apply to the needs of another. They are currently exploring sampling methods for comparing case-base characteristics in order to select appropriate cross-case-base adaptation strategies.

In order to enable effective solution retrieval across autonomous case bases with differing structures, it is essential to have access and a good understanding of each of the different case base structures involved. This would make it possible to identify the commonalities, equivalences and specific characteristics of every case base associated with the system. 2.1 The process of adaptive CBR Instead of trying to adapt the suggested solutions from one case base to the needs of another, the approach investigated in this study will be to create a “dynamic structure” of a general case. This dynamic structure would be modified every time a new case base with a new structure is added.

The process of adaptive CBR, within the architecture of the HMCBR System (Figure 1), will incorporate a number of steps.

Firstly, in order for the system to work with a particular case base, it will need to know the structure of that case base. Every newly added case base will therefore have to publish its structure to a Registry System. The published structures are required to have their own data dictionaries attached to enable the creation of a dynamic Data Dictionary.

Fig. 1. The Architecture of the HMCBR System

The published structure will be retrieved by the Dynamic CB System and used to adapt the local dynamic structure to accommodate any new elements and map existing ones.

When the dynamic structure reflects all participating case bases, a case query can be submitted. The system would then reformulate the target case structure into each provider’s case base structure.

The target case structure will be a subset of the dynamic structure.

The reformulated cases are submitted to each provider and solution cases are retrieved using KNN techniques [ 1 ]. The structures of these solutions will be translated into the dynamic structure, thus creating a dynamic case base. Finally, the system will apply the classical CBR process to the dynamic case base.

The whole process is intended to provide a transparent view of the CBR process across the heterogeneous system. 2.2

Case Study This case study requires searching for a property from three estate agencies without amalgamating their case bases structures.

Let us suppose that the estate agencies have different case base structures (figure 2).

A possible buyer should be able to search for a property and get all the suitable solutions from all three agencies. A search should retrieve the best matches from all case bases as if it was dealing with a single case base in a way transparent to the buyer.

Case Bases Structures 1(CBS1) Case Bases Structures 2(CBS2)

Case Bases Structures 3(CBS3) Fig. 2. Three different Case Base Structures Creating and maintaining a dynamic structure makes the selfadaptive multi case base reasoning system possible. By adding a new case base to the existing ones, new attributes are added to a global dynamic structure and new relations linked to these attributes are established.

CBS1. Apartment Studio type DCBS name

House 0 0 1

Flat 1 0.8 0 Fig. 3. Data Dictionary includes relations between some of the attributes.

A data dictionary is required to keep all the metadata for the dynamic structure. This data dictionary would have multiple functions: It records the location and the name of every attribute from the Case Base Structures (CBS) and how these are translated into the Dynamic Case Base Structure (DCBS). It also stores the type and any default value for every single attribute.

The Data Dictionary will reflect any relationships between the Dynamic Case Base Structure attributes. These relationships can be mathematical relationships or look-up tables (figure 3).

We will use the presented case study to show how a dynamic structure is created and how it is continuously changed by adding new case bases to the search domain.

Let us suppose that our general structure (the initial state of the Dynamic Structure containing few main attributes of a property) is already built (see figure 4). The structure has attached a basic Data Dictionary mainly containing the data types of the existing attributes.

We will show how this initial structure will be dynamically changed by consecutively adding the three agents to the search domain.

Adding the Case Base Structure 1 to the system implies mapping of the attributes ParkingSpace, Area and Type into the Dynamic Structure (these attributes are already existing in the initial structure) and also adding more attributes to it (i.e. NoOfRooms,

NoOfBathrooms, GardenLength, GardenWidth)

Data Dictionary Size: Double NoOfBedrooms: Integer Location: String ParkingSpace: double

Name: house house 1 flat 0 flat 0 1 Fig. 4. Initial state of the Dynamic Structure and Data Dictionary

The Data Dictionary will reflect the mapping of attributes:

CBS1.ParkingSpace = DCBS.ParkingSpace; CBS1.Area = DCBS.Location

CBS1.type= DCBS.name

The following attributes will be added to the dynamic data dictionary:

NoOfRooms: integer;

GardenLength: double; GardenWidth: double

Any other relevant relationships such as look-up tables for defining mappings between the values of attribute Type of CBS1 and the values of the attribute Name of the dynamic structure will be captured.

Case Base Structure 2 will add another attribute, GardenSize, to the Dynamic Structure and the data dictionary will record mapping of attributes:

CBS2.Name = DCBS.Name, CBS2.Location = DCBS.Location ,

CBS2.NoOfBedrooms = DCBS. NoOfBedrooms;

The mathematical relationships are recorded: DCBS.GardenSize = DCBS.GardenLength * DCBS.GardenWidth,

Functions can be applied, for example to keep the same metric system:

DCBS.GardenSize= CBS2.GardenSizeInFeet/(3.281)2

The Data Dictionary would also include a look-up table showing the conversion of values of CBS2.Name to values of DCBS.Name.

Attention has to be paid to the meanings of the names of the attributes. For example, if the attribute “Type” in CBS1 and the attribute “Name” in CBS2 have the same meaning (they would be translated as “Name” in DCBS, with values found in a look-up table), the attribute “Name” from CBS3 has not the same meaning as the one from CBS2. It is actually translated into DCBS.Location (similar to CBS2.Location)

Fig. 5. Adapted Dynamic Structure after CBS3 was added

By adding the third estate agent case base to the search domain, the dynamic structure will grow even more (see figure 5) and the Data dictionary will reflect it by adding the attributes DSBS.Garage and DSBS.View.

The following attributes are mapped:

CBS3.Name = DCBS.Location CBS3.Description = DCBS.Name

CBS3.GardenSizeInMeters = DCBS.GardenSize

Another look-up table can be created and added to the Data Dictionary to record the relationship between the Garage and ParkingSpace. Figure 6 shows the state of the Dynamic data Dictionary after CBS1, CBS2 and CBS3 are added.

Dynamic Data Dictionary CBS1.Area = DCBS.Location NoOfRooms: integer CBS1.type= DCBS.name DCBS.GardenSize: double DCBS.GardenSize = CBS2.GardenSizeInFeet DCBS.GardenSize = DCBS.GardenLenght *

DCBS.GardenWidth ...

CBS3.Name = DCBS.Location CBS3.GardenSizeInMetres = DCBS.GardenSize

Garage ParkingSpace Garage 1 0.7 ParkingSpace 0.7 1

Fig. 6. Adapted Dynamic Data Dictionary after CBS1, CBS2 and CBS3 are added 4 Optimising the agent collaboration process In order to optimise the process of collaboration between the agents to achieve an efficient solution from the overall CBR process when applied across the heterogeneous case bases, an overall similarity metric is required. Additionally, an overall process to enable collaboration between the agents is necessary based on a flexible architecture to enable this collaboration.

Defining an overall similarity metric The overall similarity metric between a target and a source Case can be defined as: , where: σ: overall similarity σCBy: similarity from case base provider CBy CT: target case CS: source case

: weighting for a case base provider y for case CT

To allow for defining locally optimised similarity metrics for different providers, the following metric can be defined:

, , 2 , , where: tribute x : the weighting from case base provider CBy for at, , : the local similarity metric for provider CBy for attribute x.

This extended similarity metric takes into account the level of trust that the HMCBR system attributes to the competence of each case base provider. The level of trust is determined by applying CBR to the case-base of the history of queries. Additionally it allows to adjust the trust to particular providers to different “regions” in the case base allowing for case base providers to be “specialised” on particular types of domain knowledge. Finally, the extended metric allows for different ways of defining similarity based on possible particularities pertaining to individual case base providers.

Let us assume that in our case study the third estate agent is specialised in city apartments. After a few searches for country side houses with gardens, reasoning can be applied to the History casebase. Results will show that, for this particular query, the estate agent’s level of trust is not high, i.e. there will be less solutions for this particular case base added to the Dynamic case-base.

A global level of trust of a provider’s case-base can be calculating taking in consideration the results of all the previous enquiries for that provider. 4.2 An architecture and process to support effective collaboration between case base agents The architecture of the HMCBR system shown in figure 1 contains the dynamic CB system, which incorporates a blackboard architecture. Blackboards have been used very effectively in the past for the construction of hybrid and agent based AI systems [ 11 ], [ 12 ].

The dynamic CB system is where the process for agent collaboration is controlled. It is based on a blackboard architecture incorporating the blackboard containing the target and retrieved cases from various providers together with similarity calculations and rankings. The blackboard also contains a log of the solution process and the reconciliation strategy followed, thus representing the state of the overall CBR solution process at any point in this process. Figure 7 shows the structure of the dynamic CB module incorporating the blackboard architecture.

Blackboard

The blackboard manager manages the overall solution process, communicates with and keeps track of the CB agents, selects and implements a solution strategy and monitors and evaluates the solutions achieved. Given a new target case, the blackboard manager decides on strategy for finding similar cases from the CB providers. The blackboard system decides which CB providers to use and the number of cases to retrieve from each one and other requirements, such as the requirement for diversity, similarity thresholds etc. The system then initialises the agents and assigns to them a mission. On return, the results (cases) are mapped using the dynamic data dictionary and written to the blackboard. A “global” CBR process is used to decide on the retrieved cases. The system then selects and presents the shortlisted cases after the reconciliation process and provides these to the user, together with links to their original forms for the user to explore. Finally, the system “reflects” on the process by updating the query history and confidence weights for each provider.

The system described here has been implemented and tested on a set of case bases from three different estate agent case bases, all using different structures. Experiments with the system have shown that the system can retrieve useful cases combining cases from all case bases to provide a more efficient overall solution when compared to using the case bases separately or mapping them to one central case base. Additionally, the system has shown that it can provide a more diverse retrieved case population in both cases. A full scale evaluation of the system, including using a different application domain is under way. 5

Conclusion At a time of increasing web-based communication and sharing of knowledge between organisations and organisational units within enterprises, heterogeneous CBR applied to Multiple Case Bases seems to be the natural progression in this area of research.

The paper investigates an approach based on agents operating on different structures/views of the problem domain in a transparent and autonomous way. In this approach all data is kept locally by each case base provider in its native form. Agents can be dynamically added to the system, thus increasing the search domain and potentially the competence and vocabulary of the system.

This research proposes a new architecture for a self-adaptive MCBR system which involves the use of a dynamic structure based on the blackboard architecture. The Dynamic Structure reflects all participating case base provider structures. As new agents are added to the system, their case base structure is published and is used to adapt the Dynamic Structure accordingly.

The Dynamic Structure is used at runtime to translate search queries into the local structures of each agent. Each agent can then use the translated query to match it to its local cases and retrieve the best matches.

A Data Dictionary is created in order to manage the Dynamic Structure. This contains the metadata for the Dynamic Structure, such as mapping details of the case base provider’s structures to the Dynamic Structure, type information and relationships between attributes of the dynamic structure.

The dynamic case base system manages the overall process, including controlling the agents, reconciling and optimising the retrieved cases and feeding back into its strategy by continuously adjusting weights representing confidence levels on individual case base providers. A prototype system to evaluate the efficiency of using a heterogeneous Multiple Case Based Reasoning system is currently being evaluated. Preliminary findings are encouraging.

Further work will concentrate into optimising the process of collaboration between the agents and methods and strategies for the reconciliation of retrieved cases.

References

[1]

Agre , ' KBS Maintenance as Learning Two-Tiered Domain Representation' , In M.M. Veloso , A. Aamodt , (Eds.): Case-Based Reasoning Research and Development , First International Conference, ICCBR-95 , Proceedings, Lecture Notes in Computer Science , Vol. 1010 , Springer-Verlag, 108 - 120 , 1995 .

[2]

Asuncion ,

D.J.

Newman , 'UCI Repository of Machine Learning Databases' [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA, University of California, School of Information and Computer Science ( 2007 ).

[3]

Cercone ,

An ,

Chan , ' Rule-Induction and Case-Based Reasoning: Hybrid Architectures Appear Advantageous' , IEEE Transactions on Knowledge and Data Engineering , 11 , 164 - 174 , ( 1999 ).

[4] S. I. Gallant , Neural Network Learning and Expert Systems , MIT Press, 1993 .

[5]

A.Z.

Ghalwash , ' A Recency Inference Engine for Connectionist Knowledge Bases' , Applied Intelligence , 9 , 201 - 215 , ( 1998 ).

[6]

A.R.

Golding ,

P.S.

Rosenbloom , ' Improving accuracy by combining rule-based and case-based reasoning' , Artificial Intelligence , 87 , 215 - 254 , ( 1996 ).

[7]

Hatzilygeroudis , J. Prentzas, ' Neurules: Improving the Performance of Symbolic Rules' , International Journal on AI Tools , 9 , 113 - 130 , ( 2000 ).

[8]

Hatzilygeroudis , J. Prentzas, ' Constructing Modular Hybrid Rule Bases for Expert Systems' , International Journal on AI Tools , 10 , 87 - 105 , ( 2001 ).

[9]

Hatzilygeroudis , J. Prentzas, ' An Efficient Hybrid Rule-Based Inference Engine with Explanation Capability' , Proceedings of the 14th International FLAIRS Conference , AAAI Press, 227 - 231 , ( 2001 ).

[10]

Hatzilygeroudis , J. Prentzas, ' Integrating (Rules, Neural Networks) and Cases for Knowledge Representation and Reasoning in Expert Systems' , Expert Systems with Applications , 27 , 63 - 75 , ( 2004 ).

[11]

Kolodner , Case-Based Reasoning , Morgan Kaufmann Publishers, San Mateo, CA, 1993 .

[12] D.B. Leake (ed.), Case-Based Reasoning: Experiences , Lessons & Future Directions , AAAI Press/MIT Press, 1996 .

[13]

M.R.

Lee , ' An Exception Handling of Rule-Based Reasoning Using Case-Based Reasoning' , Journal of Intelligent and Robotic Systems , 35 , 327 - 338 , ( 2002 ).

[14]

C.R.

Marling ,

Sqalli , E. Rissland,

Munoz-Avila , D. Aha, ' Case-Based Reasoning Integrations' , AI Magazine , 23 , 69 - 86 , ( 2002 ).

[15]

Montani , R. Bellazzi, ' Supporting Decisions in Medical Applications: the Knowledge Management Perspective' , International Journal of Medical Informatics , 68 , 79 - 90 , ( 2002 ).

[16]

Prentzas , I. Hatzilygeroudis, ' Integrating Hybrid Rule-Based with Case-Based Reasoning' , In S. Craw and A . Preece (Eds), Advances in Case-Based Reasoning, Proceedings of the European Conference on Case-Based Reasoning, ECCBR-2002, Lecture Notes in Artificial Intelligence , Vol. 2416 , Springer-Verlag, 336 - 349 , 2002 .

[17]

Prentzas , I. Hatzilygeroudis, ' Categorizing Approaches Combining Rule-Based and Case-Based Reasoning', Expert Systems , 24 , 97 - 122 , ( 2007 ).

[18]

E.L.

Rissland ,

D.B.

Skalak , 'CABARET: Rule Interpretation in a Hybrid Architecture' , International Journal of Man-Machine Studies , 34 , 839 - 887 , ( 1991 ).

[19]

Rossille ,

J.-F.

Laurent , A . Burgun, ' Modeling a Decision Support System for Oncology using Rule-Based and Case-Based Reasoning Methodologies' , International Journal of Medical Informatics , 74 , 299 - 306 , ( 2005 ).

[20]

Vafaie , C. Cecere, 'CORMS AI: Decision Support System for Monitoring US Maritime Environment' , Proceedings of the 17th Innovative Applications of Artificial Intelligence Conference (IAAI) , AAAI Press, 1499 - 1507 , ( 2005 ).

3.3 Combinations of CBR with Neural Networks 3.4 Combinations of CBR with Genetic Algorithms

[1]

Aamodt , E. Plaza, ' Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches' , AI Communications , 7 , 39 - 59 , ( 2004 ).

[2]

Ahn , K. Kim, ' Global optimization of case-based reasoning for breast cytology diagnosis', Expert Systems with Applications (to appear).

[3] I. Bischindaritz , ' Memoire: A Framework for Semantic Interoperability of Case-Based Reasoning Systems in Biology and Medicine' , Artificial Intelligence in Medicine , 36 , 177 - 192 , ( 2006 ).

[4]

P.P.

Bonissone , R. Lopez de Mantaras, 'Fuzzy Case-Based Reasoning Systems', Handbook on Fuzzy Computing , vol. F 4.3 , Oxford University Press, 1998 .

[5]

Castillo ,

Armengol ,

Onaindia ,

Sebastia , J. GonzalezBoticario ,

Rodriguez ,

Fernandez ,

J.D.

Arias ,

Borrajo , ' SAMAP: An User Oriented Adaptive System for Planning Tourist Visits' , Expert Systems with Applications , 34 , 1318 - 1332 , ( 2008 ).

[6]

Ceccaroni ,

Cortes , M. Sanchez-Marre, 'OntoWEDSS : Augmenting Environmental Decision-Support Systems with Ontologies' , Environmental Modelling & Software , 19 , 785 - 797 , ( 2004 ).

[7]

P.-C.

Chang ,

J.-C.

Hsieh , C.-H. Liu, ' A Case-Injected Genetic Algorithm for Single Machine Scheduling Problems with Release Time' , International Journal on Production Economics , 103 , 551 - 564 , ( 2006 ).

[8]

P.-C.

Chang , C.-Y. Lai,

K. R.

Lai , ' A Hybrid System by Evolving Case-Based Reasoning with Genetic Algorithm in Wholesaler's Returning Book Forecasting' , Decision Support Systems , 42 , 1715 - 1729 , ( 2006 ).

[9]

Cheetam ,

S.C.K.

Shiu ,

R.O.

Weber , ' Soft Case-Based Reasoning' , Knowledge Engineering Review, 20 , 267 - 269 , ( 2006 ).

[10]

Chen , P. Burrell, ' Case-Based Reasoning System and Artificial Neural Networks: A Review', Neural Computing & Applications , 10 , 264 - 276 , ( 2001 ).

[11] R.-J. Dzeng , H.-Y. Lee, 'Critiquing Contractors' Scheduling by Integrating Rule-Based and Case-Based Reasoning' , Automation in Construction, 13 , 665 - 678 , ( 2004 ).

[12]

Fdez-Riverola ,

J.M.

Corchado , ' FSfRT: Forecasting System for Red Tides' , Applied Intelligence , 21 , 251 - 264 , ( 2004 ).

[13]

Gervas ,

Diaz-Agudo ,

Peinado , R. Hervas, ' Story Plot Generation Based on CBR', Knowledge-Based Systems , 18 , 235 - 242 , ( 2005 ).

[14]

A.R.

Golding ,

P.S.

Rosenbloom , ' Improving Accuracy by Combining Rule-Based and Case-Based Reasoning' , Artificial Intelligence , 87 , 215 - 254 , ( 1996 ).

[15]

[16]

B. W.

Huang ,

M. L.

Shih ,

N.-H.

Chiu ,

W. Y.

Hu , C. Chiu, ' Price information evaluation and prediction for broiler using adapted casebased reasoning approach', Expert Systems with Applications (to appear).

[17] Y.-K. Juan , S.-G. Shih, Y.-H. Perng , ' Decision Support for Housing Customization: A hybrid approach using case-based reasoning and genetic algorithm' , Expert Systems with Applications , 31 , 83 - 93 , ( 2006 ).

[18] K.-J. Kim , ' Toward Global Optimization of Case-Based Reasoning Systems for Financial Forecasting', Applied Intelligence , 21 , 239 - 249 , ( 2004 ).

[19]

J.L.

Kolodner , Case-Based Reasoning , Morgan Kaufmann, 1993 .

[20]

Byung Kwon , ' Meta Web Service: Building Web-based Open Decision Support System based on Web Services' , Expert Systems with Applications , 24 , 375 - 389 , ( 2003 ).

[21]

Kwon , M. Kim, ' MyMessage: Case-Based Reasoning and Multicriteria Decision Making Techniques for Intelligent ContextAware Message Filtering' , Expert Systems with Applications , 27 , 467 - 480 , ( 2004 ).

[22] D.B. Leake (Ed.), Case-Based Reasoning: Experiences , Lessons, and Future Directions, AAAI Press, 1996 .

[23]

S.J.

Louis , G. McGraw ,

R.O.

Wyckoff , ' Case-based reasoning assisted explanation of genetic algorithm results' , Journal of Experimental & Theoretical Artificial Intelligence , 5 , 21 - 37 , ( 1993 ).

[24]

S. J.

Louis , C. Miles, ' Playing to Learn: Case-Injected Genetic Algorithms for Learning to Play Computer Games' , IEEE Transactions on Evolutionary Computation , 9 , 669 - 681 , ( 2005 ).

[25]

Marling ,

Sqalli , E. Rissland,

Munoz-Avila , D. Aha, 'CaseBased Reasoning Integrations', AI Magazine , 23 , 69 - 86 , ( 2002 ).

[26]

L.R.

Medsker , Hybrid Intelligent Systems , Kluwer Academic Publishers, Second Printing, 1998 .

[27]

Montani , R. Bellazzi, ' Supporting Decisions in Medical Applications: the Knowledge Management Perspective' , International Journal of Medical Informatics , 68 , 79 - 90 , ( 2002 ).

[28]

Montani ,

Magni ,

Bellazzi ,

Larizza ,

A.V.

Roudsari ,

E.R.

Carson , ' Integrating model-based decision support in a multi-modal reasoning system for managing type 1 diabetic patients' , Artificial Intelligence in Medicine , 29 , 131 - 151 , ( 2003 ).

[29]

S. K.

Pal ,

S.C.K.

Shiu , Foundations of Soft Case-Based Reasoning , John Wiley, 2004 .

[30]

E.I.

Perez ,

C.A.

Coello Coello ,

Hernandez-Aguirre , 'Extraction and Reuse of Design Patterns from Genetic Algorithms using CaseBased Reasoning' , Soft Computing, 9 , 44 - 53 , ( 2005 ).

[31]

Prentzas , I. Hatzilygeroudis, ' Categorizing Approaches Combining Rule-Based and Case-Based Reasoning', Expert Systems , 24 , 97 - 122 , ( 2007 ).

[32]

E.L.

Rissland ,

D.B.

Skalak , 'CABARET: Rule Interpretation in a Hybrid Architecture' , International Journal of Man-Machine Studies , 34 , 839 - 887 , ( 1991 ).

[33]

Rossille ,

J.-F.

[34] K.M. Saridakis , A.J. Dentsoras , ' Case-DeSC: A System for CaseBased Design with Soft Computing Techniques' , Expert Systems with Applications , 32 , 641 - 657 , ( 2007 ).

[35]

S.C.K.

Shiu ,

S.K.

Pal , ' Case-Based Reasoning: Concepts , Features and Soft Computing', Applied Intelligence , 21 , 233 - 238 , ( 2004 ).

[36] I. Watson , ' Case-Based Reasoning is a Methodology not a Technology' , Knowledge-Based Systems , 12 , 303 - 308 , ( 1997 )

[37]

Wriggers ,

Siplivaya , I. Joukova , R. Slivin, ' Intelligent Support of Engineering Analysis using Ontology and Case-Based Reasoning' , Engineering Applications of Artificial Intelligence , 20 , 709 - 720 , ( 2007 ).

[38] H.-L. Yang , C.-S. Wang, ' Two stages of case-based reasoning - Integrating genetic algorithm with data mining mechanism', Expert Systems with Applications (to appear).

[39]

F.-C.

Yuan , C. Chiu, ' A Hierarchical Design of Case-Based Reasoning in the Balanced Scorecard Application', Expert Systems with Applications (to appear).

[1]

A. V.

Adamopoulos , Anninos, P. A. , Likothanassis , S .D. , Beligiannis , G. N. , Skarlas , L. V. , Demiris E. N. and Papadopoulos , P. , 2002 . Evolutionary Self-adaptive Multimodel Prediction Algorithms of the Fetal Magnetocardiogram , 14th Int. Conf. on Digital Signal Processing (DSP 2002) , Vol. II, 1- 3 July , Santorini, Greece, pp. 1149 - 1152 .

[2]

G. N.

Beligiannis ,

L. V.

Skarlas and

S. D.

Likothanassis . “

A Generic

Applied Evolutionary Hybrid Technique for Adaptive System Modeling and Information Mining” , IEEE Signal Processing Magazine , 21 ( 3 ), pp. 28 - 38 , 2004

[3]

Colson , and

Zeleny , “ Uncertain prospects ranking and portfolio analysis under the condition of partial information ” in Mathematical Systems in Economics 44 , 1979 .

[4]

Crina ,

Ajith , I. Hisao (Eds.), “ Hybrid Evolutionary Algorithms” , Studies in Computational Intelligence , Vol. 75 , 2007

[5]

Haykin , Adaptive Filter Theory. Englewood Cliffs, NJ: PrenticeHall Int., 1991

[6]

Kakas , and

Moraitis , “ Argumentation based decision making for autonomous agents” , Proc. of the second Int. Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS03) , July 14-18 , Australia, 2003 .

[7]

S.K.

Katsikas ,

S.D.

Likothanassis ,

G.N.

Beligiannis ,

K.G.

Berketis and

D.A.

Fotakis , “ Evolutionary multimodel partitioning filters: A unified framework , ” IEEE Trans. Signal Processing , vol. 49 , no. 10 , pp. 2253 - 2261 , 2001

[8]

Markowitz , Portfolio Selection: Efficient Diversification of Investments, Wiley, New York, 1959 .

[9]

Michalewicz , Genetic Algorithms + Data Structures = Evolution Programs , 3rd ed. New York: Springer-Verlag, 1996 .

[10]

Rahwan ,

Moraitis and

Reed (Eds.) “ Argumentation in MultiAgent Systems”, Lecture Notes in Artificial Intelligence , 3366 , Springer-Verlag, Berlin, Germany, 2005

[11]

Ross , “ The Arbitrage Theory of Capital Asset Pricing” , Journal of Economic Theory , 6 , 1976 , pp. 341 - 360 .

[12]

W.F.

Sharpe , “ Capital asset prices: A theory of market equilibrium under conditions of risk” , Journal of Finance , 19 , 1964 , pp. 425 - 442 .

[13] W.F. , Sharpe , “ Mutual fund performance” , Journal of Business , 39 , 1966 , pp. 119 - 138 .

[14]

Spanoudakis and

Pendaraki , “ A Tool for Portfolio Generation Using an Argumentation Based Decision Making Framework” , in: Proceedings of the annual IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007 ), Patras, Greece, October 29-31 , 2007

[15]

J.L.

Treynor , “ How to rate management of investment funds” , Harvard Business Review , 43 , 1965 , pp. 63 - 75 .

[1] Kolodner, J.: Case-based Reasoning , Morgan Kaufmann, 1993

[2] Knight

, Petridis

, Mileman

: Maintenance of a Case-Base for the Retrieval of Rotationally Symmetric Shapes for the Design of Metal Castings LNAI 1998 : Springer Verlag. Advances in Case-Based Reasoning, 5th European Workshop , Trento, Italy, Sept. 6- 9 . 2000 pp 418 - 430

[3] Knight

, Petridis

, Mejasson

, Norman

: A Intelligent design assistant (IDA): a CBR System for Materials Design Journal of Materials and Design 22 pp 163 - 170 , 2001

[4]

David

Leake & Raja Sooriamurthi: “Automatically Selecting Strategies for Multi-Case-Base Reasoning” - Proceedings of the Sixth European Conference on Case-Based (ECCBR) , Aberdeen, Scotland, September, 2002

[5]

David

Leake & Raja Sooriamurthi : “ Managing Multiple Case Bases: Dimensions and Issues” (Proceedings of the Fifteenth Florida Artificial Intelligence Research Society (FLAIRS) Conference , Pensacola, Florida, May 2002 ),

[6]

David

Leake & Raja Sooriamurthi: “ When Two Case Bases are Better Than One: Exploiting Multiple Case Bases” (Proceedings of the Fourth International Conference on Case-Based Reasoning (ICCBR) , Vancouver, Canada, July, 2001 ).

[7]

Enric

Plaza , Santiago Ontañón: Cooperative Multiagent Learning . Adaptive Agents and Multi-Agents Systems 2002

[8]

Santiago

Ontañón , Enric Plaza: A bartering approach to improve multiagent learning . AAMAS 2002

[9] Francisco

Martín , Enric Plaza, Josep Lluís Arcos: Knowledge and Experience Reuse Through Communication Among Competent (Peer) Agents . International Journal of Software Engineering and Knowledge Engineering 1999

[10] Hayes , C. , Doyle , M. , Cunningham , P. : Distributed CBR Using

XML

, Proceedings of the Fourth European Conference on Case-Based (ECCBR) , Dublin, Ireland, June 1998 [11]

Engelmore , T Morgan : Blackboard Systems , AddisonWesley, 1988

[12] Petridis

, Knight

, ( 2001 ) : “A blackboard architecture for a hybrid CBR system for scientific software ” in Proceedings of the Workshop Program at the 4th International Conference in Case-based Reasoning ICCBR01 , Vancouver 2001, Eds R. Weber, G. von Wagenheim , pp 190 - 195 , NCARAI, Naval Research Laboratory, Code 5510 Washington DC.

[13] Case Dispatching versus Case-Base Merging: when MCBR matters: David Leake and

Raja

Sooriamurthi . International Journal on Artificial Intelligence Tools: Architectures, Languages and Algorithms (IJAIT). Special issue on recent advances in techniques for intelligent systems . Vol 13 , No 1, 2004