=Paper=
{{Paper
|id=Vol-2086/AICS2017_paper_12
|storemode=property
|title=A Cognitive Learning Model that Combines Feature Formation and Event Prediction
|pdfUrl=https://ceur-ws.org/Vol-2086/AICS2017_paper_12.pdf
|volume=Vol-2086
|authors=Eman Awad,Fintan Costello
|dblpUrl=https://dblp.org/rec/conf/aics/AwadC17
}}
==A Cognitive Learning Model that Combines Feature Formation and Event Prediction==
A Cognitive Learning Model that Combines Feature Formation and Event Prediction Eman Awad and Fintan Costello School of Computer Science and Informatics, University College of Dublin Belfield, Dublin 4, Ireland eman.awad@ucdconnect.ie , fintan.costello@ucd.ie Abstract. This paper presents a computational model that addresses two central aspects of cognitive learning: feature formation and predic- tion. Most current cognitive models fail to account for these two aspects of learning; this model, based on frequentist probability theory, provides a unified account of feature formation and temporal prediction. We dis- cribe a computational model that learns categorical and temporal rela- tionships between events to form features and make predictions about future events, with the aim of simulating human learning and prediction processes. With this novel learning mechanism, we aim to provide for further insight into the complex cognitive process of human reasoning. 1 Introduction Human learning is a complex process involving different cognitive sub-processes. Some studies have demonstrated that the formation of categories based on the probabilistic occurrence of often complex features is central to human learning and inference [8], [26]; humans form features from perceived events and learn the categorical relationships between them, which are then used to make pre- dictions [22], [24], [25], [9], [13]. Other studies have demonstrated that learning temporal relationships between events (associations between features perceived across time) is also vital to human learning and prediction [21], [20], [19], [23], [1]. With this paper we aim to present a computational model, which integrates categorisation and temporal prediction or conditioning processes as a unified ac- count of learning. Although categorisation is key to the learning process, typical categorisation models are unable to realistically simulate the human temporal learning process as they only categorise static features [24], [25], [9] [13], [17], [22]. This is insufficient as humans continually observe a dynamic environment, thus a realistic model should incorporate the ability to learn temporal and categorical relationships. Category formation and event prediction are both instances of probabilis- tic reasoning; there is evidence that probabilistic reasoning in humans follows frequentist probability theory [2], [3], a mathematical process involving the eval- uation of past events in terms of their significance in predicting a future event. The model in this paper follows the approach taken by Costello and Watts [3], [2]. The experimental results in these papers repeatedly showed that the biases observed in people’s estimated probability could be explained by a model where people’s reasoning follows frequentist probability theory but is subjected to ran- dom noise. This evidence suggests that people reason in a manner consistent with the frequentist probability theory and it is this theory that we will apply on our model. The aim of this research is to create a model designed to form features, learn the categorical and temporal relationships between them, and predict fu- ture events following frequentist probability theory. The prediction mechanism in our model involves two stages: forming features and then calculating predictions. In the first stage, relationships between events or combinations of events will be evaluated in a process referred to as the model’s statistical evaluation process. The events that represent a reliable relationship are selected and formed into features. The temporal and categorical relationship between constituent events of the features is also learned. This process determines which past events are significant in predicting future events. In the second stage, the model combines predictions from relevant reliable features to calculate the overall predicted prob- ability value for a given event. The first section of this paper will delineate the evaluation process used to form reliable features; a description of the prediction mechanism will also be provided. The second section outlines an overview of the model design. Finally, we will test the model’s performance to evaluate the prediction mechanism and present the analysis of the results. 2 Reliability: Feature Formation Our model is underpinned by the theoretical assumption that the human learning process follows frequentist probability theory; humans evaluate past experiences based on the statistical reliability of previously learned features to inform their predictions about what may occur next in the environment [12], [18]. We argue that this evaluation process is based on statistical reasoning to form predictive features, which will be used in the prediction calculation. We illustrate this for- mation of features using an example in Fig(1), where we aim to evaluate the reliability of the relationship between two events. This figure shows the occur- rence of events A and S across time. After the first occurrence of A followed by S, an event N was created to represent the possibility of a relationship between A and S; the probability of N will change as it occurs again. For example, con- sider time t=8 with five events perceived in this period. Event A occurred three times in this period; in two of the three times, event A was followed by event S after one second. At t=8s, the overall probability of S occurring is two out of eight; p(S) = 0.25. The question is whether the occurrence of S after A can be explained by the base rate occurrence of S alone; P (S) = 0.25. If it cannot be explained by P (S), we assume a reliable causal relationship between A and S. Otherwise, the relationship between the two events is not reliable and A does Fig. 1: (a) illustrates a series of events occurring across time. Time is measured in seconds. (b) shows the frequencies of events changing across time. At time 8s, the binomial result was higher than (0.05), indicating the unreliability of A in predicting S. However, at time 16s, the result was less than (0.05), indicating the reliability of feature A in predicting S. Filled circle represents reliable predictor N , indicated here as (r). not predict S. Based on frequentist probability theory, the binomial probability function, is used to determine the reliability of the relationship [11]. We calculate this using the following equation: x k Bin(k, x, p) = p (1 − p)x−k (1) k Where x represents the number of times event A occurred (which may or may not be followed by event S), k represents the number of times that the consequent event S followed the antecedent event A, and p is the base rate probability of the consequent event P (S). This function represents the chance of drawing a sample of x events from a population where p = P (S), and of which k of them are instances of S. Based on frequentist probability theory, if the binomial probability result Bin(k, x, p) is less than or equal to the statistical significance level of (0.05), it indicates that the occurrence of S after A cannot be explained by the base rate occurrence of S, P (S) = 0.25, and we can then deduce that the relationship between A and S is causal and reliable. However, if the binomial result is higher than 0.05, then there is insufficient evidence to prove that the relationship is reliable at that time. At time = 8s in Fig(1), the binomial result Bin[2, 3, 0.25] = 0.14 is higher than the significance level (0.05), indicating that there is not enough evidence to prove that there is a causal relationship between A and S. As such, event A is not a reliable predictor of S at this time. Fig. 2: An example of complex events occurring across time. Filled circles represent reliable predictors (n1 and n2 ), indicated here as (r). If we consider time t = 16s, event A occurred five times within this period; in four of these five times, A was followed by event S after one second. The question again is whether this occurrence of S after A can be explained by the base rate occurrence of S; P (S) = 0.25. Again, we calculate the binomial probability Bin[4, 5, 0.25] = 0.01, which is lower than the significance level (0.05), indicating that there is enough evidence to prove that the relationship is reliable. Thus, at this time feature A is considered a reliable predictor of S; we designate this as (r) in Fig(1). This example illustrates the formation of a reliable feature N , which indicates that if A occurs, there is 45 chance of seeing S one second later. (In this example the time interval between the events is one second. This interval was chosen for illustrative purposes. The model uses various different intervals, which can be larger than 1s.) Complex Features: The earlier example assumed that we only have a single event as an antecedent and a single event as a consequent. In reality, we can encounter more complicated events that have more than a single event as an antecedent. In the case of a complex feature, the complex event N holds a com- plex antecedent that includes two sub-events or more: event A as antecedent and another event B as a consequent. The two sub-events, A and B are the complex antecedent of N, and the consequent of that event is S (as shown in Fig.2). To form complex features, we must first determine if the complex event repre- sents a reliable relationship or not. To do this, we use Eq.(1) where x represents the number of times the complex antecedent (A then B) occurred and k rep- resents the number of times that the complex antecedent was followed by the consequent S. In determining the relationship between complex features and S, there are three probabilities that we must test the occurrences against: (1) the probability of S, P (S) which was exactly as defined previously in simple events; (2) the probability of event n1 i.e. S following the antecedent A alone; P (S|A), and (3) the probability of event n2 i.e. S following the antecedent B alone; P (S|B). In this case, the question is whether the occurrence of S after (A then B) can be explained by the base rate occurrence of S, P (S) = 0.28, or by the observed rate occurrence of the simple features n1 , P (S|A), and n2 , P (S|B). If it cannot be explained by these probabilities i.e. P (S), P (S|A) and P (S|B), we de- duce that there is a reliable causal relationship between the complex antecedent and S. Otherwise, the relationship is not reliable and the complex antecedent (A and B) together does not specifically predict the occurrence of S. As a working example, see Fig(2). Here we consider three different features that potentially predict the occurrence of S: a simple feature n1 (which predicts that S will occur 2 seconds after the occurrence of A), a simple feature n2 (which predicts that S will occur 1 second after the occurrence of B), and a complex feature N (which predicts that S will occur 1 second after the occurrence of the complex antecedent ’A then B’). At time t = 21s, we can ask whether n1 and n2 are reliable features (whether they represent statistically reliable relation- ships between A and S or B and S) by using the binomial function as before, taking the base-rate occurrence of P (S) = 0.28. The binomial test results are Bin[6, 7, 0.28] = 0.001 and Bin[4, 5, 0.28] = 0.019 respectively, indicating that both n1 and n2 are reliable features: A reliably predicts S after 2 seconds (with probability 0.85) and, independently, B reliably predicts S after 1 second (with probability 0.8). When testing the complex feature N , which represents the complex an- tecedent (A and B) predicting the consequent S, we ask whether the observed rate of occurrence of S after this complex antecedent can be explained by the independent occurrence of these two predictors A and B. If the observed rate of occurrence of S after this complex feature cannot be explained by the indepen- dent occurrence of these two separate predictors, this indicates that there is some further causal relationship between the specific complex event (A and B) and the occurrence of S, and the complex feature N will be marked as a reliable predic- tor. At time t = 21s feature n1 predicts S with probability P (S|A) = 0.85, and feature n2 predicts S with probability P (S|B) = 0.8. The complex antecedent (A then B) has occurred five times by time t = 21s, with four of those occurrences followed by S. This rate of occurrence is explained both by the predictor n1 (Bin[4, 5, 0.85] = 0.39) and separately by the predictor n2 (Bin[4, 5, 0.8] = 0.4). This indicates that there is insufficient evidence to prove that the relationship is reliable between the specific complex antecedent (A and B) and the occurrence of S; the integrated event N is not a reliable predictor of S (i.e. not a reliable feature). In other situations, the complex feature N might be reliable. For example, consider a different situation, where the complex antecedent (A and B) occurred five times. In all of those times, S followed this complex antecedent, after one second (i.e. the integrated event, N). However, event A and B occurred frequently without being followed by S (A occurred ten times and B occurred fifteen times). Consider the base rate of the consequent and the probability of individual events; P (S) = 0.16, P (S|A) = 0.5 and P (S|B) = 0.33 respectively, the binomial result for the complex event N are Bin[5, 5, 0.16] = 0.0001, Bin[5, 5, 0.5] = 0.003, and Bin[5, 5, 0.33] = 0.004, which are all lower than the statistical significance level (0.05). In this case, A alone does not predict S and B alone does not predict S. However, both features A and B together will be a reliable predictor of S. 3 Predicted Probability The set of independent predictive features identified by the evaluation process that was discussed earlier would subsequently be used in the prediction calcula- tion. For the purpose of this discussion, we will consider the example in Fig(2) where the simple features n1 and n2 were both reliable predictors of S, and the integrated event N that connects these two predictors (A and B) together, was not a reliable predictor. The aim of prediction calculation is to calculate the overall probability of S occurring next, after accounting for all recognised reliable predictors for this event. According to standard frequentist probability theory, if there is a set of independent features predicting an event, the over- all predicted probability is calculated by OR-ing all the independent predictors using the following OR expression: P r(Si ,t ) = 1 − Π(1 − pi ) (2) where pi is the probability of event Si occuring, given by each independent predictor, at a given time (t). In this example, as previously explained, the two predictors A and B are independent from each other because the integrated event N that connects them together, does not reliably predict S. Since they are independent, we can com- bine the probabilities of the independent predictors, (P (S|A) and P (S|B)) to calculate the overall prediction of S, using Eq.2 as follows: P r(S) = 1 − [(1 − ( 67 )) ∗ (1 − ( 45 )]= 0.97 This indicates that the overall probability of S occurring after the reliable predictor (simple feature n1 and n2 independently) at a given time is high. However, at some point in the future, event N may become reliable. In this case, the individual predictors (A followed by S and B followed by S), are no longer independent predictors of S, and therefore, they will not be used in the calculation for the prediction of S. The integrated feature N will be used in the OR expression instead of n1 and n2 , to calculate the overall prediction of S. 4 Model Implementation We utilise frequentist probability theory to build a unified prediction model that forms features; the prediction mechanism in this model uses these features to make predictions. In our model, we divide the memory into short-term (ST M ) and long-term memory (LT M ). The LTM holds a record of all the perceived events and the relationships between them. These events will be stored in the form of nodes. Each node contains an antecedent (earlier sub-event being per- ceived), a consequent (the later sub-event being perceived), the time interval between them (t), the total number of times the antecedent was followed by the consequent, with the exact time interval t (represented as counter (k)), and the total number of times the antecedent was perceived (represented as counter (x)). On the other hand, the STM holds the most recent events only. Considering the limited capacity of STM in humans, we simulate this computational limitation by designing a limited STM in the models memory. The STM is also responsible for updating the counters (x and k), for all related nodes, when new events are perceived. In order to calculate the prediction of any given event S, the model selects all nodes that have S as a consequent and the antecedent of the node occurring in the STM. The binomial function will be applied on all the selected nodes, using Eq.2, to evaluate the relationship between all the antecedents and S; this will enable the model to identify all the independent predictors that will reliably predict S. At the end, all the independent predictors will be combined to calculate the overall predicted probability of S using Eq.2. 5 Evaluating the Models Prediction Mechanism We developed a simulation tool we refer to as the Random Generator (RG)to test this model. RG generates test data simulating real world environment by producing a sequence of events with various randomly generated temporal rela- tionships linking the occurrence of events together. This test data generated by the RG will be used to evaluate our model’s performance. In the initialisation phase of the RG tool, it assigns random probabilities to different temporal rela- tionships between a large number of events and combinations of events. The next event is produced in a way that is dependent probabilistically on the sequence of prior events. The RG tool embodies a probabilistic context - sensitive gram- mar with long-range dependencies [14], [7]. Our model will perceive the sequence of events generated by the RG tool and learn the relationships between these events. In this test, our model will be challenged to identify the various patterns within the sequence of events generated by the RG tool. The learning mechanism in our model aims to learn and recognise reliable relationships, to form features and estimate the predicted probability of future occurrences. If the learning mechanism functions effectively, it should recognise and successfully learn the hidden relationships between these events. We will test the model by evaluating all the event predictions. Those pre- dictions will be compared to the actual events that occurred succeeding the antecedents. For example, if the model predicts that event Si will occur with a probability of 0.2, P r(Si, t) = 0.2, in the test, all the times the model makes a prediction of an event occurring with this probability, will be counted. If the prediction matches the actual event i.e. Si does actually occur with a probability of 0.2, then the model’s prediction is considered to be accurate. 5.1 Material The testing process involves two phases, the learning phase and the prediction phase. In the first phase, we run the model j number of times (where j = 10.000), to observe and learn the relationships between the sequence of events generated by the RG tool. Events with reliable relationships between them are formed into a feature. In the second phase, we will run the model for N number of times (where N = 50.000), to calculate and obtain the prediction P r(Si ,t ) at each time-step. To examine the efficiency of the prediction values, we calculate the proportion of the predicted events observed in the test data. First, we examine all previously perceived events and evaluate the prediction value given by the model for each of those events by counting the number of times that the predicted event actually happens. Subsequently, we compare the rate of occurrence of this real event (i.e. the proportion of its observed occurrence) with the prediction value P r(Si ,t ) calculated by the model. The accuracy of the model’s prediction mechanism can be assessed by comparing these two values. 5.2 Method After observing the sequence of events generated by the RG tool, the model’s prediction mechanism calculates the predicted probability for each event occur- ring at every time-step. This is represented by an overall predicted probability value P r(Si ,t ) for each event Si occurring at time t. We examine the value of P r(Si ,t ) in different ranges. We separate the P r(Si ,t ) values into range R specified in Table (1). We then count the total number of occurrences that are predicted to occur with probability value P r(Si ,t ) falling between a specific range R; (Rmin ≤ PN P r(Si ,t ) ≤ Rmax ) i.e. j |P r(Si , t) ∈ R|. We calculate this value for all the R (see Table 1). For each range R, we identify all these occurrences and count PN the total number of times the predicted event actually does occur i.e. t=j |P r(Si ,t ) ∈ R ∩ Si ,t |. Subsequently, we calculate proportion of those cases in which the predicted event actually occurs i.e. O(Si ,t , R). To calculate this, we PNdivide the total number of times that the event does occur in each range t=j |P r(Si ,t ) ∈ R ∩ Si ,t | by the total number of times the event is predicted PN to occur in the range j |P r(Si , t) ∈ R| as shown by the following formula: PN | j|P r(Si ,t ) ∈ R ∩ Si ,t | O (Si , R) = PN (3) | j P r(Si ,t ) ∈ R| Where P r(Si ,t ) is the predicted probability of event i at time t; j is the starting time (measured in seconds) and N is ending time (measured in seconds). For example, for range R = 0.20 − 0.25 the total number of times the event was predicted to occur with this probability ((0.20 ≤ P r(Si ,t ) ≤ 0.25)) was PN 3016 events i.e. j |P r(Si , t) ∈ R| = 3016. Out of these 3016 cases, only 643 PN actually occurred i.e. t=j |P r(Si ,t ) ∈ R ∩ Si ,t | = 643, therefore the proportion of occurrences for the predicted events O (Si , R) was 0.21. We assess the accuracy of the model’s prediction by comparing this propor- tion of occurrences for the predicted events O (Si , R) to the range it is supposed to fall within (R). If the model predicts that an event will occur with a proba- bility of 0.15, and the proportion (given by O (Si , R) for the predicted events) is close to 0.15 (e.g. 0.18), this indicates that the model is predicting the next event accurately. The proportion should match or be close to the range R-value range. Since the accuracy assessment for the model’s prediction requires comparison between O(Si ,t , R) and the range R, we used the absolute difference function as a measure of the comparison between the two. We measure the absolute difference between the mid point of the range M (R) and the proportion of observations O(Si ,t , R) i.e. |M (R) − O(Si ,t , R)|. PN PN R j P r(Si ,t ) ∈ R j |P r(Si ,t ) ∈ R ∩ Si O(Si ,t , R) |M (R) − O(Si ,t , R)| 0.05 - 0.10 1393 193 0.14 0.063 0.10 - 0.15 2268 401 0.18 0.051 0.15 - 0.20 2600 516 0.19 0.023 0.20 - 0.25 3016 643 0.21 0.012 0.25 - 0.30 3901 1094 0.28 0.005 0.30 - 0.35 3341 1005 0.3 0.025 0.35 - 0.40 1806 597 0.33 0.045 0.40 - 0.45 788 272 0.35 0.080 0.45 - 0.50 307 117 0.38 0.095 0.50 - 0.55 171 64 0.37 0.151 0.55 - 0.60 60 32 0.53 0.045 0.60 - 0.65 25 13 0.52 0.105 Table 1: Evaluating the Model’s Prediction: This table illustrates the level of agree- ment between the model’s predicted probability values P r for each range R and the proportion PN of observations of the actual events O(Si ,t , R). The second column shows j P r(Si ,t ) ∈ R: the total number of occurrences that were predicted to occur in R. The third column shows N P j P r(Si ,t ) ∈ R ∩ Si ,t the total number of times that the event did occur in R. The fourth column shows the overall proportion of observations that were predicted O(Si ,t , R). Finally, the last column is the accuracy measure for the model’s prediction as the absolute difference value |M (R) − O(Si ,t , R)|. 5.3 Results We found that the level of agreement between the proportion of observations for predicted events O(Si ,t , R) and the corresponding range R was high; the overall correlation between O(Si ,t , R) and mid-point of the range M (R) was very high (r = 0.97, ρ(O,M (R)) < 0.0001). In general, the O(Si ,t , R) fell within the corresponding range R or was close to R. For example, for P r(Si ,t ) values between 0.15 − 0.20 i.e. (0.15 ≤ P r(Si ,t ) ≤ 0.20), the value of the correspond- ing proportion O i.e. O(Si ,t , 0.15 − 0.20) = 0.19 fell within the same R range as the P r(Si ,t ). In some cases, the value of O fell just of outside the corre- sponding range. For example, in the case where P r(Si ,t ) value was between 0.05 − 0.10 (i.e. (0.05 ≤ P r(Si ,t ) ≤ 0.10)), the value of the corresponding O i.e.O(Si ,t , 0.05 − 0.10) = 0.14 fell slightly outside of the R range of the P r(Si ,t ). However, even though this value does not exactly fall within the range, it is close to the Rmax (Rmax = 0.10). Overall, our results suggest that in general, the prediction mechanism in our model is efficient i.e. the model is able to predict effectively. We observe that therePNis a relationship between the total number of oc- currences in the range ( j P r(Si ,t ) ∈ R) and the proportion of observations O (O(Si ,t , R)); the bigger the number of cases (i.e. the bigger the value of PN j P r(Si ,t ) ∈ R) or the sample size, the closer the proportion of observa- tions values O was to the corresponding range. The measure that we have for accuracy assessment of the model’s prediction was |M (R) − O(Si ,t , R)|. With this, we can estimate how the accuracy of the model’s prediction changes as the sample size varies. For example, for range (0.50 ≤ R ≤ 0.55), the total num- PN ber of occurrences ( j P r(Si ,t ) ∈ R) was only 171. In this case, the absolute difference (|M (R) − O(Si ,t , R)|) was 0.151, indicating that the accuracy of the prediction is low. However, for range (0.25 ≤ R ≤ 0.30), the total number of PN occurrences ( j P r(Si ,t ) ∈ R) was 3901. In this case, the absolute difference PN ( j P r(Si ,t ) ∈ R) was 0.005, indicating that the accuracy of the prediction is high. We can deduce from this that the difference between O and R is smaller when the model was provided with a bigger sample size. This suggests that the model produces more accurate predictions when more information is available about the likelihood of occurrences, which is not surprising considering that this information can be used in the prediction calculation P r(Si , t). We note that the low prediction accuracy observedP in some ranges coincided with the range N R with lower number of occurrences ( j P r(Si ,t ) ∈ R) i.e. the model has only perceived those events a small number of times. This indicates that the model is accurately predicting the probability in general and that the anomalous results are expected due to the small sample size. In conclusion, our test demonstrated that the model has learned the relation- ships between events efficiently and is capable of calculating relatively accurate predictions based on these learned relationships. 6 Conclusion In this paper, we propose an alternative approach to cognitive modeling. We design the learning mechanism so that it is able to identify reliable relationships between events, form features, and then categorise them in a way that simulates human learning as much as possible. We also incorporate in our model’s predic- tion mechanism the frequentist probability theory, which has been shown to be applicable in human prediction processes [2], [3]. There are various alternative models in the literature that are successful in categorising and making predictions over time. Artificial Neural Networks (ANNs), using deep learning algorithms, are currently the most commonly used model in the field of prediction. While the model we present here is an account of human learning and is not intended to compete with such machine learning algorithms, a brief comparison is worthwhile. The main difference between our model and ANNs is that ANNs are associative models [15], [10], [27] whereas ours is representational. Unlike ANNs, our model includes explicit features that can be extracted and easily used. Moreover, as Gallistel and Gibbon highlighted [5], [4], [6], associative models create associations between two events that are frequently perceived together with a short time interval between them but these associations do not necessarily represent reliable relationships between the events, as these relationships are not evaluated. In contrast, representational models represent the learned data and store it in the ”if-then rule” format (if an event A occurs, then after a specific time interval t, event B will occur) [16]. In representational models such as ours, the model can distinguish between causal and coincidental relationships through statistical calculation. In addition, representational models can learn temporal relationships even with a large time interval between events, to verify if there is a causal relationship between them. In summary, we argue that the prediction model presented in this paper is a more realistic simulation of human learning than ANNs. The simulation is more realistic as its evaluation process is designed using frequentist probability theory that enables the model to form specific meaningful features that are easily accessible and can be used for prediction, unlike ANNs which lack this process. In future work, we plan to conduct further tests to improve the learning model and incorporate additional features. The experiments and tests would focus on the models ability to process more complex data to ensure that the prediction results are as close to human predictions as possible. We also plan to integrate the ability to predict actions as well as perceptions into our prediction mechanism with the overall aim of enabling our model to make decisions and perform goal driven actions. References 1. Balsam, P.D., Drew, M.R., Gallistel, C.: Time and associative learning. Compar- ative Cognition & Behavior Reviews 5, 1 (2010) 2. Costello, F., Watts, P.: Surprisingly rational: probability theory plus noise explains biases in judgment. Psychological review 121(3), 463 (2014) 3. Costello, F., Watts, P.: Peoples conditional probability judgments follow probabil- ity theory (plus noise). Cognitive Psychology 89, 106–133 (2016) 4. Gallistel, C.R., Gibbon, J.: Time, rate, and conditioning. Psychological review 107(2), 289 (2000) 5. Gallistel, C.R.: The organization of learning. The MIT Press (1990) 6. Gallistel, C., Gibbon, J.: Computational versus associative models of simple con- ditioning. Current Directions in Psychological Science 10(4), 146–150 (2001) 7. Geman, S., Johnson, M.: Probabilistic grammars and their applications. Interna- tional Encyclopedia of the Social & Behavioral Sciences 2002, 12075–12082 (2002) 8. Griffiths, T., Yuille, A.: A primer on probabilistic inference. The probabilistic mind: Prospects for Bayesian cognitive science pp. 33–57 (2008) 9. Hampton, J.A.: Similarity-based categorization: the development of prototype the- ory. Psychologica Belgica 35(2-5), 104–125 (1995b) 10. Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural networks and learning machines, vol. 3. Pearson Upper Saddle River, NJ, USA: (2009) 11. Hodges Jr, J.L., Lehmann, E.L.: Basic concepts of probability and statistics. SIAM (2005) 12. Holdershaw, J., Gendall, P.: Understanding and predicting human behaviour. In: ANZCA08 Conference, Power and Place. Wellington (2008) 13. Kruschke, J.K.: Alcove: An exemplar-based connectionist model of category learn- ing. Psychological Review 99(1), 22–44 (1992) 14. Lafferty, J., Sleator, D., Temperley, D.: Grammatical trigrams: A probabilistic model of link grammar, vol. 56. School of Computer Science, Carnegie Mellon University (1992) 15. Michel, A.N., Farrell, J.A.: Associative memories via artificial neural networks. IEEE Control Systems Magazine 10(3), 6–17 (1990) 16. Nan, J.: A learning model that combines categorisation and conditioning (2013), www.summon.com, university Collage Dublin, School of Computer Science and In- formatics 17. Nosofsky, R.M.: Attention, similarity, and the identification–categorization rela- tionship. Journal of experimental psychology: General 115(1), 39 (1986) 18. Osberg, T.M., Shrauger, J.S.: Self-prediction: Exploring the parameters of accu- racy. Journal of Personality and Social Psychology 51(5), 1044 (1986) 19. Pavlov, I.P.: Lectures on conditioned reflexes. vol. ii. conditioned reflexes and psy- chiatry. (1941) 20. Pavlov, I.P., Anrep, G.V.: Conditioned reflexes. Courier Corporation (2003) 21. Pavlov, I.P., Gantt, W.A.H.: Lectures on conditioned reflexes. Liveright, New York (1928) 22. Rosch, E.: Principles of categorization. Concepts: core readings pp. 189–206 (1999) 23. Savastano, H.I., Miller, R.R.: Time as content in pavlovian conditioning. Be- havioural Processes 44(2), 147–162 (1998) 24. Skinner, B.F.: The behavior of organisms: An experimental analysis. (1938) 25. Skinner, B.F.: The behavior of organisms: An experimental analysis. BF Skinner Foundation (1990) 26. Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. arXiv preprint arXiv:1207.1420 (2012) 27. Zurada, J.M.: Introduction to artificial neural systems, vol. 8. West St. Paul (1992)