computers that understand human languages, and various Perceived information, Kant’s a posteriori knowledge, forms of web agents. rarely fits precisely into our preconceived and a priori The failure, however, of computers to succeed at the schemata. From this tension to comprehend and act as an task of creating a general-purpose thinking machine begins agent, the schema-based biases a subject uses to organize to shed some understanding on the “failures” of the experience are strengthened, modified, or replaced. This imitation game itself. Specifically, the imitation game accommodation in the context of unsuccessful interactions offers no hint of a definition of intelligent activity nor does with the environment drives a process of cognitive it offer specifications for building intelligent artifacts. equilibration. The constructivist epistemology is one of Deeper issues remain that Turing did not address. What IS cognitive evolution and continuous model refinement. An intelligence? What IS grounding (how may a human’s or important consequence of constructivism is that the computer’s statements be said to have “meaning”)? interpretation of any perception-based situation involves Finally, can humans understand their own intelligence in a the imposition of the observers (biased) concepts and manner sufficient to formalize or replicate it? categories on what is perceived. This constitutes an This paper considers these issues, especially the inductive bias (Luger 2009, Ch 16). responses to the challenge of building intelligent artifacts When Piaget (1970) proposed a constructivist approach to that the modern artificial intelligence community has a child’s understanding the external world, he called it a taken. In earlier venues (Luger 2012, Luger and genetic epistemology. When encountering new phenomena, Chakrabarti 2014) we presented general issues of artificial the lack of a comfortable fit of current schemata to the intelligence artifacts and the epistemological biases they world “as it is” creates a cognitive tension. This tension embody. In this paper we view modern artificial drives a process of schema revision. Schema revision, intelligence, especially the commitment to stochastic Piaget’s accommodation, is the continued evolution of the models, and the epistemological stance that supports this agent’s understanding towards equilibration. approach. In the next section we present a constructivist There is a blending here of empiricist and rationalist rapprochement of the empiricist, rationalist, and pragmatist traditions, mediated by the pragmatist requirement of agent positions that supported early AI work and addresses many survival. As embodied, agents can comprehend nothing of its dualist assumptions. We also offer some preliminary except that which first passes through their senses. As conjectures about how a Bayesian model might be accommodating, agents survive through learning the epistemologically plausible. general patterns of an external world. What is perceived is mediated by what is expected; what is expected is influenced by what is perceived: these two functions can Modern AI: Probabilistic models only be understood in terms of each other. A Bayesian We view a constructivist and model-revising epistemology model-refinement representation offers an appropriate as a rapprochement between the empiricist, rationalist, and model for critical components of this constructivist model- pragmatist viewpoints. The constructivist hypothesizes that revising epistemological stance (Luger et al. 2002, Luger all human understanding is the result of an interaction 2012). Interestingly enough, David Hume acknowledged between energy patterns in the world and mental categories the epistemic foundation of all human activity (including, imposed on the world by an intelligent agent (Piaget 1954, of course, the construction of AI artifacts) in A Treatise on 1970; von Glasersfeld 1978). Using Piaget’s terms, we Human Nature (1739/2000) when he stated “All the humans assimilate external phenomena according to our sciences have a relation, greater or less, to human nature; current understanding and accommodate our understanding and ... however wide any of them may seem to run from it, to phenomena that do not meet our prior expectations. they still return back by one passage or another. Even Constructivists use the term schemata to describe the a Mathematics, Natural Philosophy, and Natural Religion, priori structure used to mediate the experience of the are in some measure dependent on the science of MAN; external world. The term schemata is taken from the since they lie under the cognizance of men, and are judged writing of the British psychologist Bartlett (1932) and its of by their powers and faculties.” philosophical roots go back to Kant (1781/1964). On this Thus, we can ask why a constructivist epistemology viewpoint observation is not passive and neutral but active might be useful in addressing the problem of building and interpretative. There are many current psychologists programs that are “intelligent”. How can an agent within and philosophers that support and expand this pragmatic an environment understand its own understanding of that and teleological account of human developmental activity situation? We believe that constructivism also addresses (Glymour 2001, Gopnik et al. 2010, Gopnik 2011a, 2011b, this problem of epistemological access. For more than a Kushnir et al. 2010). century there has been a struggle in both philosophy and psychology between two factions: the positivist, who proposes to infer mental phenomena from observable physical behavior, and a more phenomenological approach To describe further the pieces of Bayes formula: The which allows the use of first person reporting to enable probability of an hypothesis being true, given a set of access to cognitive phenomena. This factionalism exists evidence, is equal to the probability that the evidence is because both modes of access to cognitive phenomena true given the hypothesis times the probability that the require some form of model construction and inference. hypothesis occurs. This number is divided (normalized) by In comparison to physical objects like chairs and doors, the probability of the evidence itself, p(E). This probability which often, naively, seem to be directly accessible, the of evidence is represented as the sum over all hypotheses mental states and dispositions of an agent seem to be presenting the evidence times the probability of that particularly difficult to characterize. We contend that this hypothesis itself. dichotomy between direct access to physical phenomena There are limitations to using Bayes’ theorem as just and indirect access to mental phenomena is illusory. The presented as an epistemological characterization of the constructivist analysis suggests that no experience of the phenomenon of interpreting new (a posteriori) data in the external (or internal) world is possible without the use of context of (prior) collected knowledge and experience. some model or schema for organizing that experience. In First, of course, is the fact that the epistemological subject scientific enquiry, as well as in our normal human is not a calculating machine. We simply don’t have all the cognitive experiences, this implies that all access to prior (numerical) values for all the hypotheses and phenomena is through exploration, approximation, and evidence that can fit a problem situation. In a complex continued model refinement. situation such as medicine where there can be hundreds of Bayes theorem (1763) offers a plausible model of this hypothesized diseases and thousands of symptoms, this constructivist rapprochement between the philosophical calculation is intractable (Luger 2009, Chapter 5). traditions we have just discussed. It is also an important A second objection is that in most situations the sets of modeling tool for much of modern AI, including AI evidence are NOT independent, given the set of programs for natural language understanding, robotics, and hypotheses. This makes the calculation of p(E) in the machine learning. With a high-level discussion of Bayes’ denominator of Bayes rule as just presented unjustified. insights, we can describe the power of this approach. When this independence assumption is simply ignored, as Consider the general form of Bayes’ relationship used to we see shortly, the result is called naïve Bayes. More often, determine the probability of a particular hypothesis, hi, however, the rationalization of the probability of the given a set of evidence E: occurrence of evidence across all hypotheses is seen as simply a normalizing factor, supporting the calculation of a p(E | h i )p(h i ) realistic measure for the probability of the hypothesis given p(h i | E) = n the evidence (the left side of Bayes’ equation). The same ∑ p(E | h )p(h ) k k normalizing factor is utilized in determining the actual k=1 probability of any of the hi, given the evidence, and thus, as in most natural language processing applications, is   usually ignored. p(hi|E) is the probability that a particular hypothesis, hi, is € A final objection asserts that diagnostic reasoning is not true given evidence E. about the calculation of probabilities; it is about p(hi) is the probability that hi is true overall. determining the most likely explanation, given the p(E|hi) is the probability of observing evidence E when hi accumulation of pieces of evidence. Humans are not doing is true. real-time complex mathematical processing; rather we are n is the number of possible hypotheses. looking for the most coherent explanation or possible hypothesis, given the amassed data. Thus, a much more With the general form of Bayes’ theorem we have a intuitive form of Bayes rule ignores this p(E ) denominator functional (and computational!) description (model) for a entirely (as well as the associated assumption of evidence particular situation happening given a set of perceptual independence). The resulting formula determines the evidence clues. Epistemologically, we have created on the likelihood of any hypothesis given the evidence, as the right hand size of the equation a schema describing how product of the probability of the evidence given the prior accumulated knowledge of occurrences of hypothesis times the probability of the hypothesis itself phenomena can relate to the interpretation of a new p(E|hi) p(hi). In most diagnostic situations we are asked to situation, the left hand side of the equation. This determine which of a set of hypotheses hi is most likely to relationship can be seen as an example of Piaget’s be supported. We refer to this as determining the argmax assimilation where encountered information fits (is across all the set of hypotheses. Thus, if we wish to interpreted by) the patterns created from prior experiences. determine which of all the hi has the most support we look for the largest p(E|hi) p(hi): We&next&illustrate&the&Bayesian&approach&in&two&application&doma discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabar argmax(hi) p(E|hi) p(hi) creating&the&greatest&likelihood&for&hypotheses&across&expanding& expert, the presence of a break supports a number of Figure&1,&presenting&two&failures&of&discrete&component&semicond alternative hypotheses. The search for the most likely In a dynamic interpretation, as sets of evidence “open”,&or&the&break&in&a&wire&connecting&components&to&others&in explanation for a failure broadens the evidence search: themselves change across time, we will call this argmax of Howthe&presence&of&a&break&supports&a&number&of&alternative&hypothe large is the break? Is there any discoloration related hypotheses given a set of evidence at a particular time the explanation&for&a&failure&broadens&the&evidence&search:&How&large to the break? Were there any (perceptual) sounds or smells greatest likelihood of that hypothesis at that time. We show whenrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smel it happened? What were the resulting conditions of this relationship, an extension of the Bayesian maximum a resulting&conditions&of&the&components&of&the&system? the components of the system? & posteriori (or MAP) estimate, as a dynamic measure over time t: gl(hi|Et) = argmax(hi) p(Et|hi) p(hi) This model is both intuitive and simple: the most likely interpretation of new data, given evidence E at time t, is a We&next&illustrate&the&Bayesian&approach&in&two&application&domains.&In&the&diagnosis&of&failures&in& function of which interpretation is most likely to produce discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabarti&et&al.&2005)&we&have&an&example&of& that evidence at time t and the probability of that creating&the&greatest&likelihood&for&hypotheses&across&expanding&data&sets.&Consider&the&situation&of& interpretation itself occurring. Figure&1,&presenting&two&failures&of&discrete&component&semiconductors.&The&failure&type&is&called&an& By the early 1990s, much of computation-based “open”,&or&the&break&in&a&wire&connecting&components&to&others&in&the&system.&For&the&diagnostic&expert,& language understanding and generation was stochastic, the&presence&of&a&break&supports&a&number&of&alternative&hypotheses.&The&search&for&the&most&likely& including parsing, part-of-speech tagging, reference explanation&for&a&failure&broadens&the&evidence&search:&How&large&is&the&break?&Is&there&any&discoloration& resolution, andrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smells&when&it&happened?&What&were&the& discourse processing, usually using tools & like greatest likelihood measures (Jurafsky and Martin resulting&conditions&of&the&components&of&the&system? 2009). Other areas of artificial intelligence, especially & & & & (a)a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.& machine learning, became more Bayesian-based. In many Figure&1.&Two&examples&of&discrete&component&semiconductors ways these uses of stochastic technology for pattern “connection&broken”&failure.& recognition were another instantiation of the constructivist & tradition, as collected sets of patterns were used to Driven&by&the&data&search&supporting&multiple&possible&hypothese condition recognition of new patterns. notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&Th Judea Pearl’s (1988) proposal for use of Bayesian belief likelihood&hypothesis&that&explains&the&open&as&a&break&created&by nets (BBNs) and his assumption of their links reflecting caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&g “causal” relationships (Pearl 2000) brought the use of the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&mel Bayesian technology to an entirely new importance. First, these&diagnostic&scenarios&have&been&implemented&by&an&expert&s the assumption of these networks being directed graphs – hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayes reflecting causal relationships – and disallowing cycles – Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and no entity can cause itself – brought a radical improvement to the computational costs of reasoning with BBNs (Luger The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&exp 2009, Ch 9). Second, these same two assumptions made domain.&In&fact,&these&networks&of&causal&relationships&are&usually & working&with&human&experts’&analysis&of&known&failures.&Thus,&th & the BBN representation much more transparent as a representational& tool that expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&p & could& capture a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.& causal relations. (b) e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&norma Finally, most all the traditional powerful stochastic most&likely&explanation,&within&its&(a&priori)&model,&given&this&new Figure&1.&Two&examples&of&discrete&component&semiconductors,&each&exhibiting&the&“open”&or& representations used in language work and machine Figure 1. Two examples of discrete component “connection&broken”&failure.& rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&o learning, for example, & the hidden Markov model in the semiconductors, each exhibiting the “open” or “connection one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypo form of a dynamic Bayesian network (DBN), could be Driven&by&the&data&search&supporting&multiple&possible&hypotheses&that&can&explain&the&“open”,&the&expert& broken” failure. likelihood&measures&decrease&within&the&BBN.& readily integrated into this new representational formalism. notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&This&suggests&a&revised&greatest& We next illustrate the Bayesian approach in two likelihood&hypothesis&that&explains&the&open&as&a&break&created&by&metal&crystallization&that&was&likely& Driven by the data search supporting multiple possible This¤t&example&demonstrates&how&the&most&likely¤t&h application domains. In the diagnosis of failures in discrete caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&greatest&likely&hypothesis&for&the&open&of& hypotheses that can explain the “open”, the expert notes best&explanation,&given&a&particular&time&and&an&hypothesis&space component semiconductors (Stern et al. 1997, Chakrabarti the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&melting&due&to&excessive¤t.&Both&of& the sets&of&hypotheses&and&data&across&time,&using&the&most&likely&hyp bambooing effect in the disconnected wire, Figure 1a. et al. 2005) wethese&diagnostic&scenarios&have&been&implemented&by&an&expert&systemSlike&search&through&an& have an example of creating the greatest Thislikelihood&hypothesis.& suggests a revised greatest likelihood hypothesis that likelihood for hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayesian&belief&net&(Chakrabarti&et&al.&2005).&& hypotheses across expanding data sets. explains the open as a break created by metal Consider the situation of Figure 1, presenting two failures gl(hi|Et) =that Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and&other&related&diagnostic&situations.& crystallization argmax(h i) p(E was likely t|hi) by caused p(hai)&sequence of low- of discrete component semiconductors. The failure type is frequency high-current pulses. The greatest likely The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&expert’s&knowledge&of&an&application& called an “open”, or the break in a wire connecting hypothesis for the open of the example of Figure 1b, where components todomain.&In&fact,&these&networks&of&causal&relationships&are&usually&carefully&crafted&through&many&hours& others in the system. For the diagnostic the break is seen as balled, is melting due to excessive working&with&human&experts’&analysis&of&known&failures.&Thus,&the&BBN&can&be&said&to&capture&a&priori& expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&posteriori)&data&are&given&to&the&BBN,& e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&normal,&etc,&the&belief&network&“infers”&the& most&likely&explanation,&within&its&(a&priori)&model,&given&this&new&information.&There&are&many&inference& rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&of&using&the&BBN&technology&is&that&as& one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypotheses&are&“explained&away”,&i.e.,&their& likelihood&measures&decrease&within&the&BBN.& current. Both of these diagnostic scenarios have been the running health of the rotor system. The model used to implemented by an expert system-like search through an diagnose rotor health the auto-regressive hidden Markov hypothesis space (Stern et al. 1997) as well as reflected in a model (A-RHMM) of Figure 4. The observable states of Bayesian belief net (Chakrabarti et al. 2005). Figure 2 the system are made up of the sequences of the segmented presents a Bayesian belief net (BBN) capturing these and signals in the frequency domain while the hidden states are other related diagnostic situations. the imputed health states of the helicopter rotor system The BBN, without new data, represents the a priori state itself, as seen in the lower right of Figure 3. of an expert’s knowledge of an application domain. In fact, The hidden Markov model (HMM) technology is an these networks of causal relationships are usually carefully important stochastic technique that can be seen as a variant crafted through many hours working with human experts’ of a dynamic BBN. In the HMM, we attribute values to analysis of known failures. Thus, the BBN can be said to states of the network that are themselves not directly capture a priori expert knowledge implicit in a domain of observable. For example, the HMM technique is widely interest. When new (a posteriori) data are given to the used in the computer analysis of human speech, trying to BBN, e.g., the wire is “bambooed”, the color of the copper determine the most likely word uttered, given a stream of wire is normal, etc, the belief network “infers” the most acoustic signals (Jurasky and Martin 2009). In our likely explanation, within its (a priori) model, given this helicopter example, training this system on streams of new information. There are many inference rules for doing normal transmission data allowed the system to make the this (Luger 2009, Chapter 9). An important result of using correct greatest likelihood measure of failure when these the BBN technology is that as one hypothesis achieves its signals change to indicate a possible breakdown. The US greatest likelihood, other related hypotheses are “explained Navy supplied data to train the normal running system as away”, i.e., their likelihood measures decrease within the In&this&model&the&most&likely&interpretation&of&new&data,&given&evidence&E&at&time&t,&is&a&function&of&which& well data sets for transmissions that contained seeded BBN. interpretation&is&most&likely&to&produce&that&evidence&at&time&t&and&the&probability&of&that&interpretation& faults. Thus, the hidden state St of the A-RHMM reflects This current example demonstrates how the most likely itself&occurring.&If&we&want&to&expand&this&to&the&next&time&period,&t'+'1,&we&need&to&describe&how&models& the greatest likelihood hypothesis of the state of the rotor current hypothesis can be used to determine the best can&evolve&across&time.& system, given the observed evidence Ot at any time t. explanation, given a particular time and an hypothesis & space. We next demonstrate how considering sets of hypotheses and data across time, using the most likely hypothesis at time t, can produce a greatest likelihood hypothesis. gl(hi|Et) = argmax(hi) p(Et|hi) p(hi) In this model the most likely interpretation of new data, given evidence E at time t, is a function of which interpretation is most likely to produce that evidence at time t and the probability of that interpretation itself occurring. If we want to expand this to the next time period, t + 1, we need to describe how models can evolve across time. As an example of argmax processing, Chakrabarti et al. (2005, 2007) analyze a continuous data stream from a set of distributed sensors. The running “health” of the transmission of a Navy helicopter rotor system is represented by a steady stream of sensor data. This data consists of temperature, vibration, pressure, and other measurements reflecting the state of the various components of the running transmission system. An example of this data can be seen in the top portion of && Figure 3, where the continuous data stream is broken into discrete and partial time slices. Figure Figure&2.&A&Bayesian&belief&network&representing&the&causal&relationships&and&data&points&implicit&in& 2. A Bayesian belief network representing the A Fourier transform is then used to translate these causal the&discrete&component&semiconductor&domain.&As&data&is&“discovered”&the&(a&priori)&probabilistic& relationships and data points implicit in the discrete signals into the frequency domain, as shown on the left hypotheses&change&and&suggest&further&search&for&data.& component semiconductor domain. As data is “discovered” side of the second row of Figure 3. These frequency the (a priori) probabilistic hypotheses change and suggest readings were compared across time periods to diagnose As&an&example&of&argmax&processing,&Chakrabarti&et&al.&(2005,&2007)&analyze&a&continuous&data&stream&from& further search for data. a&set&of&distributed&sensors.&The&running&“health”&of&the&transmission&of&a&Navy&helicopter&rotor&system&is& represented&by&a&steady&stream&of&sensor&data.&This&data&consists&of&temperature,&vibration,&pressure,&and& other&measurements&reflecting&the&state&of&the&various&components&of&the&running&transmission&system.&An& example&of&this&data&can&be&seen&in&the&top&portion&of&Figure&3,&where&the&continuous&data&stream&is&broken& into&discrete&and&partial&time&slices.& A&Fourier&transform&is&then&used&to&translate&these&signals&into&the&frequency&domain,&as&shown&on&the& figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&syst & & Figure 4. The data of Figure 3 is processed using an auto- & regressive hidden Markov model. States Ot represent the observable values at time t. The St states represent the Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&m hidden “health” states of the rotor system, {safe, unsafe, faulty} at time t. represent&the&observable&values&at&time&t.&The&St&states&represent&the&hidden&“health”& system,Conclusion: &{safe, unsafe, An at&time&t.& faulty}&epistemological stance & Turing’s test for intelligence was agnostic both as to what a Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of& 3.'Conclusion:'An'Epistemological'Stance'' computer was composed of – vacuum tubes, transistors, or tinker toys - as well as to the languages used to make it the&figure&presents&the&original&data&stream&(left)&and&an&enlarged&time&slice&(right).&The&lower&left& run. It simply required the responses of the machine to be figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the&frequency& Turing’s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&o roughly equivalent to the responses of humans in the same domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&system.& & it&run.&It&simply&re situations. & Modern AI research has proposed probabilistic transistors, &or&tinker&toys&S&as&well&as&to&the&languages&used&to&make& representations and algorithms for the real-time integration of new (a posteriori) information into previously (a priori) of&the&machine&to&be&roughly&equivalent&to&the&responses&of&humans&in&the&same&situat learned patterns of information (Dempster 1968). Among Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of& these algorithms is loopy belief propagation (Pearl 1988, 2000) that captures a system of plausible beliefs constantly the&figure&presents&the&original&data&stream&(left)&Modern& and&an&AeI&nlarged& research&thime& slice&(right). as&proposed& &The&lower& probabilistic& left& and&algorithms&for&th representations& iterating towards equilibrium, or equilibration, as Piaget might describe it. A cognitive system can be in a priori figure&is&the&result&of&the&Fourier&t&ransform&of&the&of&time& new&s(a&lice& data&(transformed)& information&into&into& the&frequency& equilibrium with its continuing states of learned diagnostic posteriori)& previously& knowledge. When presented with novel information (a&priori)&learned&patterns&of&inform Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&model.&States&O & characterizing a new diagnostic situation, this a posteriori domain. represent& &The&values&lower& the&observable& at&time&t.&The&right& S &states&frigure& epresent&the&rhepresents& idden&“health”&states&tohe& 1968).s&tates& f&the&rhotor&idden& t Among&otf&hese& the&halgorithms& elicopter&is&rotor& loopy,sbystem. elief,propagation& & data perturbs the equilibrium. The cognitive system then (Pearl&1988,&2000)&that&cap t & infers, using prior and posterior components of the model, system,&{safe, unsafe, faulty}&at&time&t.& Figure 3. Real-time data from the transmission system of a plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&m until it finds convergence or equilibrium, in the form of a 3.'helicopter’s Conclusion:'An'Epistemological' Stance'' particular greatest likelihood hypothesis. mission&system&of&a&helicopter’s&rotor.&The&top&component&of& cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&dia rotor. The top component of the figure The claim of this paper is that stochastic methods offer a presents the original data stream (left) and an enlarged time Turing’ s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&of&–&vacuum&tubes,& am&(left)&and&an&enlarged&time&slice&(right).&The&lower&left& sufficient account of human intelligence in areas such as slice (right). The lower left figure is the result of the transistors, &or&ttransform inker&toys&S&as&well&ofas&tthe o&the<ime anguages&slice used&to&mdata ake&it&r(transformed) When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&thi un.&It&simply&required&the&into responses& diagnostic reasoning. This includes the computation of a rm&of&the&time&slice&data&(transformed)&into&the&frequency& Fourier greatest likelihood measure of hypotheses, given new of&the the&machine& frequencyto&be&roughly&domain. equivalent&to&tThe ts&the&hidden&states&of&the&helicopter&rotor&system.& the hidden states of the helicopter rotor system. he&responses& lower of&humans& rightin&the&figure perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&c same&situations. & represents information and an expert’s a priori cognitive equilibrium. Further, we contend that the greatest likelihood calculation model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&lik Modern&AI&research&has&proposed&probabilistic&representations&and&algorithms&for&the&realStime&integration& is cognitively plausible and offers an epistemological framework for understanding the phenomena of human of&new&(a&posteriori)&information&into&previously&(a&priori)&learned&patterns&of&information&(Dempster& diagnostic and prognostic reasoning. 1968).&Among&these&algorithms&is&loopy,belief,propagation&(Pearl&1988,&2000)&that&captures&a&system&of& The claim of this paper is that stochastic methods offer a sufficient account of hum plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&might&describe&it.&A& cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&diagnostic&knowledge.& areas such as diagnostic reasoning. This includes the computation of a greatest like When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&this&a&posteriori&data& perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&components&of&the& hypotheses, given new information model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&likelihood&hypothesis.& & and an expert’s a priori cognitive equilibrium. F The claim of this paper is that stochastic methods offer a sufficient account of human intelligence inthat the greatest likelihood calculation is cognitively plausible and offers an epistem & Figure&4.&The&data&of&theFcomputation areas such as diagnostic reasoning. This includes igure&3&ofis&a greatest processed& using&ofan&understanding likelihood measure for autoSregressive& hidden&Markov& the phenomena model. of human &States&andOt&prognostic reasoning. diagnostic References Piaget, J. (1954), The Construction of Reality in the Child, New York: Basic Books. Bartlett, F. (1932), Remembering, London: Cambridge University Piaget, J. (1970), Structuralism, New York: Basic Books. Press. Simon, H. A. (1981), The Sciences of the Artificial (2nd ed), Bayes, T. (1763), Essay Towards Solving a Problem in the Cambridge MA: MIT Press. Doctrine of Chances, Philosophic Transactions of the Royal Stern, C.R. and Luger, G. F. (1997), Abduction and Abstraction Society of London, London: The Royal Society, pp 370-418. in Diagnosis: A Schema Based Account. In Android Chakrabarti, C., Rammohan, R., and Luger, G. F. (2005), ‘A Epistemology, Ford et al. eds, Cambridge MA: MIT Press. First-Order Stochastic Modeling Language for Diagnosis’ Turing, A. (1950), ‘Computing Machinery and Intelligence’, Proceedings of the 18th International Florida Artificial Mind, 59, 433-460. Intelligence Research Society Conference, (FLAIRS-18). Palo Alto: AAAI Press. von Glaserfeld, E. (1978), ‘An Introduction to Radical Constructivism’, The Invented Reality, Watzlawick, ed., pp17-40, Chakrabarti, C., Pless, D. J., Rammohan, R., and Luger, G. F. New York: Norton. (2007), ‘Diagnosis Using a First-Order Stochastic Language That Learns’, Expert Systems with Applications. Amsterdam: Elsevier Press. 32 (3). Dempster, A.P. (1968), ‘A Generalization of Bayesian Inference’, Journal of the Royal Statistical Society, 30 (Series B): 1-38. Glymour, C. (2001). The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology, MIT Press, 2001 Gopnik, A., Glymour, C., Sobel, D.M., Schulz, L.E., Kushnir, T. and Danks, D., 2004. A theory of causal learning in children: Causal maps and Bayes nets. Psychological Review, 111(1): 3-32. Gopnik, A. (2011a). A unified account of abstract structure and conceptual change: Probabilistic models and early learning mechanisms. Commentary on Susan Carey "The Origin of Concepts." Behavioral and Brain Sciences 34,3:126-129. Gopnik, A. (2011b). Probabilistic models as theories of children's minds. Behavioral and Brain Sciences 34,4:200-201. Hume, D. (1739/2000). A Treatise of Human Nature, edited by D. F. Norton and M. J. Norton, Oxford/New York: Oxford University Press. Jurasky, D. and Martin, J. M. (2009), Speech and Language Processing, Upper Saddle River NJ: Pearson Education. Kant, I. (1781/1964), Immanuel Kant’s Critique of Pure Reason, Smith, N.K., translator, New York: St. Martin’s Press. Kushnir, T. Gopnik, A. Lucas, C. and Schulz L. (2010). Inferring hidden causal structure. Cognitive Science 34:148-160. Luger, G. F. (2009), Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 6th edition, Boston: Addison-Wesley Pearson Education. Luger, G. F. (2012), Epistemology, Access, and Computational Models. In The Complex Mind, McFarland, Stenning, and McGonigle Chalmbers, editors. London: Palgrave Macmillan. Luger, G. F. and Chakrabarti, C. (2014), From Alan Turing to Modern AI: An Epistemological Stance (in submission), copies available from first author. Luger, G.F., Lewis, J.A., and Stern, C. (2002), ‘Problem Solving as Model-Refinement: Towards a Constructivist Epistemology’, Brain, Behavior, and Evolution, Basil: Krager, 59: 87-100. Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Los Altos CA: Morgan Kaufmann. Pearl, J. (2000), Causality, Cambridge UK: Cambridge University Press. Peirce, C.S. (1958), Collected Papers 1931 – 1958, Cambridge MA: Harvard University Press.