=Paper=
{{Paper
|id=Vol-1144/paper13
|storemode=property
|title=Modern AI: Stochastic Models and an Epistemological Stance
|pdfUrl=https://ceur-ws.org/Vol-1144/paper13.pdf
|volume=Vol-1144
|dblpUrl=https://dblp.org/rec/conf/maics/LugerC14
}}
==Modern AI: Stochastic Models and an Epistemological Stance==
computers that understand human languages, and various Perceived information, Kant’s a posteriori knowledge,
forms of web agents. rarely fits precisely into our preconceived and a priori
The failure, however, of computers to succeed at the schemata. From this tension to comprehend and act as an
task of creating a general-purpose thinking machine begins agent, the schema-based biases a subject uses to organize
to shed some understanding on the “failures” of the experience are strengthened, modified, or replaced. This
imitation game itself. Specifically, the imitation game accommodation in the context of unsuccessful interactions
offers no hint of a definition of intelligent activity nor does with the environment drives a process of cognitive
it offer specifications for building intelligent artifacts. equilibration. The constructivist epistemology is one of
Deeper issues remain that Turing did not address. What IS cognitive evolution and continuous model refinement. An
intelligence? What IS grounding (how may a human’s or important consequence of constructivism is that the
computer’s statements be said to have “meaning”)? interpretation of any perception-based situation involves
Finally, can humans understand their own intelligence in a the imposition of the observers (biased) concepts and
manner sufficient to formalize or replicate it? categories on what is perceived. This constitutes an
This paper considers these issues, especially the inductive bias (Luger 2009, Ch 16).
responses to the challenge of building intelligent artifacts When Piaget (1970) proposed a constructivist approach to
that the modern artificial intelligence community has a child’s understanding the external world, he called it a
taken. In earlier venues (Luger 2012, Luger and genetic epistemology. When encountering new phenomena,
Chakrabarti 2014) we presented general issues of artificial the lack of a comfortable fit of current schemata to the
intelligence artifacts and the epistemological biases they world “as it is” creates a cognitive tension. This tension
embody. In this paper we view modern artificial drives a process of schema revision. Schema revision,
intelligence, especially the commitment to stochastic Piaget’s accommodation, is the continued evolution of the
models, and the epistemological stance that supports this agent’s understanding towards equilibration.
approach. In the next section we present a constructivist There is a blending here of empiricist and rationalist
rapprochement of the empiricist, rationalist, and pragmatist traditions, mediated by the pragmatist requirement of agent
positions that supported early AI work and addresses many survival. As embodied, agents can comprehend nothing
of its dualist assumptions. We also offer some preliminary except that which first passes through their senses. As
conjectures about how a Bayesian model might be accommodating, agents survive through learning the
epistemologically plausible. general patterns of an external world. What is perceived is
mediated by what is expected; what is expected is
influenced by what is perceived: these two functions can
Modern AI: Probabilistic models only be understood in terms of each other. A Bayesian
We view a constructivist and model-revising epistemology model-refinement representation offers an appropriate
as a rapprochement between the empiricist, rationalist, and model for critical components of this constructivist model-
pragmatist viewpoints. The constructivist hypothesizes that revising epistemological stance (Luger et al. 2002, Luger
all human understanding is the result of an interaction 2012). Interestingly enough, David Hume acknowledged
between energy patterns in the world and mental categories the epistemic foundation of all human activity (including,
imposed on the world by an intelligent agent (Piaget 1954, of course, the construction of AI artifacts) in A Treatise on
1970; von Glasersfeld 1978). Using Piaget’s terms, we Human Nature (1739/2000) when he stated “All the
humans assimilate external phenomena according to our sciences have a relation, greater or less, to human nature;
current understanding and accommodate our understanding and ... however wide any of them may seem to run from it,
to phenomena that do not meet our prior expectations. they still return back by one passage or another. Even
Constructivists use the term schemata to describe the a Mathematics, Natural Philosophy, and Natural Religion,
priori structure used to mediate the experience of the are in some measure dependent on the science of MAN;
external world. The term schemata is taken from the since they lie under the cognizance of men, and are judged
writing of the British psychologist Bartlett (1932) and its of by their powers and faculties.”
philosophical roots go back to Kant (1781/1964). On this Thus, we can ask why a constructivist epistemology
viewpoint observation is not passive and neutral but active might be useful in addressing the problem of building
and interpretative. There are many current psychologists programs that are “intelligent”. How can an agent within
and philosophers that support and expand this pragmatic an environment understand its own understanding of that
and teleological account of human developmental activity situation? We believe that constructivism also addresses
(Glymour 2001, Gopnik et al. 2010, Gopnik 2011a, 2011b, this problem of epistemological access. For more than a
Kushnir et al. 2010). century there has been a struggle in both philosophy and
psychology between two factions: the positivist, who
proposes to infer mental phenomena from observable
physical behavior, and a more phenomenological approach To describe further the pieces of Bayes formula: The
which allows the use of first person reporting to enable probability of an hypothesis being true, given a set of
access to cognitive phenomena. This factionalism exists evidence, is equal to the probability that the evidence is
because both modes of access to cognitive phenomena true given the hypothesis times the probability that the
require some form of model construction and inference. hypothesis occurs. This number is divided (normalized) by
In comparison to physical objects like chairs and doors, the probability of the evidence itself, p(E). This probability
which often, naively, seem to be directly accessible, the of evidence is represented as the sum over all hypotheses
mental states and dispositions of an agent seem to be presenting the evidence times the probability of that
particularly difficult to characterize. We contend that this hypothesis itself.
dichotomy between direct access to physical phenomena There are limitations to using Bayes’ theorem as just
and indirect access to mental phenomena is illusory. The presented as an epistemological characterization of the
constructivist analysis suggests that no experience of the phenomenon of interpreting new (a posteriori) data in the
external (or internal) world is possible without the use of context of (prior) collected knowledge and experience.
some model or schema for organizing that experience. In First, of course, is the fact that the epistemological subject
scientific enquiry, as well as in our normal human is not a calculating machine. We simply don’t have all the
cognitive experiences, this implies that all access to prior (numerical) values for all the hypotheses and
phenomena is through exploration, approximation, and evidence that can fit a problem situation. In a complex
continued model refinement. situation such as medicine where there can be hundreds of
Bayes theorem (1763) offers a plausible model of this hypothesized diseases and thousands of symptoms, this
constructivist rapprochement between the philosophical calculation is intractable (Luger 2009, Chapter 5).
traditions we have just discussed. It is also an important A second objection is that in most situations the sets of
modeling tool for much of modern AI, including AI evidence are NOT independent, given the set of
programs for natural language understanding, robotics, and hypotheses. This makes the calculation of p(E) in the
machine learning. With a high-level discussion of Bayes’ denominator of Bayes rule as just presented unjustified.
insights, we can describe the power of this approach. When this independence assumption is simply ignored, as
Consider the general form of Bayes’ relationship used to we see shortly, the result is called naïve Bayes. More often,
determine the probability of a particular hypothesis, hi, however, the rationalization of the probability of the
given a set of evidence E: occurrence of evidence across all hypotheses is seen as
simply a normalizing factor, supporting the calculation of a
p(E | h i )p(h i ) realistic measure for the probability of the hypothesis given
p(h i | E) = n
the evidence (the left side of Bayes’ equation). The same
∑ p(E | h )p(h )
k k normalizing factor is utilized in determining the actual
k=1
probability of any of the hi, given the evidence, and thus,
as in most natural language processing applications, is
usually ignored.
p(hi|E) is the probability that a particular hypothesis, hi, is
€ A final objection asserts that diagnostic reasoning is not
true given evidence E.
about the calculation of probabilities; it is about
p(hi) is the probability that hi is true overall.
determining the most likely explanation, given the
p(E|hi) is the probability of observing evidence E when hi
accumulation of pieces of evidence. Humans are not doing
is true.
real-time complex mathematical processing; rather we are
n is the number of possible hypotheses.
looking for the most coherent explanation or possible
hypothesis, given the amassed data. Thus, a much more
With the general form of Bayes’ theorem we have a
intuitive form of Bayes rule ignores this p(E ) denominator
functional (and computational!) description (model) for a
entirely (as well as the associated assumption of evidence
particular situation happening given a set of perceptual
independence). The resulting formula determines the
evidence clues. Epistemologically, we have created on the
likelihood of any hypothesis given the evidence, as the
right hand size of the equation a schema describing how
product of the probability of the evidence given the
prior accumulated knowledge of occurrences of
hypothesis times the probability of the hypothesis itself
phenomena can relate to the interpretation of a new
p(E|hi) p(hi). In most diagnostic situations we are asked to
situation, the left hand side of the equation. This
determine which of a set of hypotheses hi is most likely to
relationship can be seen as an example of Piaget’s
be supported. We refer to this as determining the argmax
assimilation where encountered information fits (is
across all the set of hypotheses. Thus, if we wish to
interpreted by) the patterns created from prior experiences.
determine which of all the hi has the most support we look
for the largest p(E|hi) p(hi):
We&next&illustrate&the&Bayesian&approach&in&two&application&doma
discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabar
argmax(hi) p(E|hi) p(hi) creating&the&greatest&likelihood&for&hypotheses&across&expanding&
expert, the presence of a break supports a number of
Figure&1,&presenting&two&failures&of&discrete&component&semicond
alternative hypotheses. The search for the most likely
In a dynamic interpretation, as sets of evidence “open”,&or&the&break&in&a&wire&connecting&components&to&others&in
explanation for a failure broadens the evidence search:
themselves change across time, we will call this argmax of Howthe&presence&of&a&break&supports&a&number&of&alternative&hypothe
large is the break? Is there any discoloration related
hypotheses given a set of evidence at a particular time the explanation&for&a&failure&broadens&the&evidence&search:&How&large
to the break? Were there any (perceptual) sounds or smells
greatest likelihood of that hypothesis at that time. We show whenrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smel
it happened? What were the resulting conditions of
this relationship, an extension of the Bayesian maximum a resulting&conditions&of&the&components&of&the&system?
the components of the system? &
posteriori (or MAP) estimate, as a dynamic measure over
time t:
gl(hi|Et) = argmax(hi) p(Et|hi) p(hi)
This model is both intuitive and simple: the most likely
interpretation of new data, given evidence E at time t, is a
We&next&illustrate&the&Bayesian&approach&in&two&application&domains.&In&the&diagnosis&of&failures&in&
function of which interpretation is most likely to produce
discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabarti&et&al.&2005)&we&have&an&example&of&
that evidence at time t and the probability of that
creating&the&greatest&likelihood&for&hypotheses&across&expanding&data&sets.&Consider&the&situation&of&
interpretation itself occurring.
Figure&1,&presenting&two&failures&of&discrete&component&semiconductors.&The&failure&type&is&called&an&
By the early 1990s, much of computation-based
“open”,&or&the&break&in&a&wire&connecting&components&to&others&in&the&system.&For&the&diagnostic&expert,&
language understanding and generation was stochastic,
the&presence&of&a&break&supports&a&number&of&alternative&hypotheses.&The&search&for&the&most&likely&
including parsing, part-of-speech tagging, reference
explanation&for&a&failure&broadens&the&evidence&search:&How&large&is&the&break?&Is&there&any&discoloration&
resolution, andrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smells&when&it&happened?&What&were&the&
discourse processing, usually using tools &
like greatest likelihood measures (Jurafsky and Martin
resulting&conditions&of&the&components&of&the&system?
2009). Other areas of artificial intelligence, especially & & & & (a)a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.&
machine learning, became more Bayesian-based. In many Figure&1.&Two&examples&of&discrete&component&semiconductors
ways these uses of stochastic technology for pattern “connection&broken”&failure.&
recognition were another instantiation of the constructivist &
tradition, as collected sets of patterns were used to Driven&by&the&data&search&supporting&multiple&possible&hypothese
condition recognition of new patterns. notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&Th
Judea Pearl’s (1988) proposal for use of Bayesian belief likelihood&hypothesis&that&explains&the&open&as&a&break&created&by
nets (BBNs) and his assumption of their links reflecting caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&g
“causal” relationships (Pearl 2000) brought the use of the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&mel
Bayesian technology to an entirely new importance. First, these&diagnostic&scenarios&have&been&implemented&by&an&expert&s
the assumption of these networks being directed graphs – hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayes
reflecting causal relationships – and disallowing cycles – Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and
no entity can cause itself – brought a radical improvement
to the computational costs of reasoning with BBNs (Luger The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&exp
2009, Ch 9). Second, these same two assumptions made domain.&In&fact,&these&networks&of&causal&relationships&are&usually
& working&with&human&experts’&analysis&of&known&failures.&Thus,&th &
the BBN representation much more transparent as a
representational& tool that expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&p
& could& capture a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.&
causal relations. (b)
e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&norma
Finally, most all the traditional powerful stochastic
most&likely&explanation,&within&its&(a&priori)&model,&given&this&new
Figure&1.&Two&examples&of&discrete&component&semiconductors,&each&exhibiting&the&“open”&or&
representations used in language work and machine Figure 1. Two examples of discrete component
“connection&broken”&failure.& rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&o
learning, for example, & the hidden Markov model in the semiconductors, each exhibiting the “open” or “connection
one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypo
form of a dynamic Bayesian network (DBN), could be
Driven&by&the&data&search&supporting&multiple&possible&hypotheses&that&can&explain&the&“open”,&the&expert&
broken” failure.
likelihood&measures&decrease&within&the&BBN.&
readily integrated into this new representational formalism.
notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&This&suggests&a&revised&greatest&
We next illustrate the Bayesian approach in two
likelihood&hypothesis&that&explains&the&open&as&a&break&created&by&metal&crystallization&that&was&likely&
Driven by the data search supporting multiple possible
This¤t&example&demonstrates&how&the&most&likely¤t&h
application domains. In the diagnosis of failures in discrete
caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&greatest&likely&hypothesis&for&the&open&of&
hypotheses that can explain the “open”, the expert notes
best&explanation,&given&a&particular&time&and&an&hypothesis&space
component semiconductors (Stern et al. 1997, Chakrabarti
the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&melting&due&to&excessive¤t.&Both&of&
the sets&of&hypotheses&and&data&across&time,&using&the&most&likely&hyp
bambooing effect in the disconnected wire, Figure 1a.
et al. 2005) wethese&diagnostic&scenarios&have&been&implemented&by&an&expert&systemSlike&search&through&an&
have an example of creating the greatest Thislikelihood&hypothesis.&
suggests a revised greatest likelihood hypothesis that
likelihood for hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayesian&belief&net&(Chakrabarti&et&al.&2005).&&
hypotheses across expanding data sets. explains the open as a break created by metal
Consider the situation of Figure 1, presenting two failures gl(hi|Et) =that
Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and&other&related&diagnostic&situations.&
crystallization argmax(h i) p(E
was likely t|hi) by
caused p(hai)&sequence of low-
of discrete component semiconductors. The failure type is frequency high-current pulses. The greatest likely
The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&expert’s&knowledge&of&an&application&
called an “open”, or the break in a wire connecting hypothesis for the open of the example of Figure 1b, where
components todomain.&In&fact,&these&networks&of&causal&relationships&are&usually&carefully&crafted&through&many&hours&
others in the system. For the diagnostic the break is seen as balled, is melting due to excessive
working&with&human&experts’&analysis&of&known&failures.&Thus,&the&BBN&can&be&said&to&capture&a&priori&
expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&posteriori)&data&are&given&to&the&BBN,&
e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&normal,&etc,&the&belief&network&“infers”&the&
most&likely&explanation,&within&its&(a&priori)&model,&given&this&new&information.&There&are&many&inference&
rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&of&using&the&BBN&technology&is&that&as&
one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypotheses&are&“explained&away”,&i.e.,&their&
likelihood&measures&decrease&within&the&BBN.&
current. Both of these diagnostic scenarios have been the running health of the rotor system. The model used to
implemented by an expert system-like search through an diagnose rotor health the auto-regressive hidden Markov
hypothesis space (Stern et al. 1997) as well as reflected in a model (A-RHMM) of Figure 4. The observable states of
Bayesian belief net (Chakrabarti et al. 2005). Figure 2 the system are made up of the sequences of the segmented
presents a Bayesian belief net (BBN) capturing these and signals in the frequency domain while the hidden states are
other related diagnostic situations. the imputed health states of the helicopter rotor system
The BBN, without new data, represents the a priori state itself, as seen in the lower right of Figure 3.
of an expert’s knowledge of an application domain. In fact, The hidden Markov model (HMM) technology is an
these networks of causal relationships are usually carefully important stochastic technique that can be seen as a variant
crafted through many hours working with human experts’ of a dynamic BBN. In the HMM, we attribute values to
analysis of known failures. Thus, the BBN can be said to states of the network that are themselves not directly
capture a priori expert knowledge implicit in a domain of observable. For example, the HMM technique is widely
interest. When new (a posteriori) data are given to the used in the computer analysis of human speech, trying to
BBN, e.g., the wire is “bambooed”, the color of the copper determine the most likely word uttered, given a stream of
wire is normal, etc, the belief network “infers” the most acoustic signals (Jurasky and Martin 2009). In our
likely explanation, within its (a priori) model, given this helicopter example, training this system on streams of
new information. There are many inference rules for doing normal transmission data allowed the system to make the
this (Luger 2009, Chapter 9). An important result of using correct greatest likelihood measure of failure when these
the BBN technology is that as one hypothesis achieves its signals change to indicate a possible breakdown. The US
greatest likelihood, other related hypotheses are “explained Navy supplied data to train the normal running system as
away”, i.e., their likelihood measures decrease within the In&this&model&the&most&likely&interpretation&of&new&data,&given&evidence&E&at&time&t,&is&a&function&of&which&
well data sets for transmissions that contained seeded
BBN. interpretation&is&most&likely&to&produce&that&evidence&at&time&t&and&the&probability&of&that&interpretation&
faults. Thus, the hidden state St of the A-RHMM reflects
This current example demonstrates how the most likely itself&occurring.&If&we&want&to&expand&this&to&the&next&time&period,&t'+'1,&we&need&to&describe&how&models&
the greatest likelihood hypothesis of the state of the rotor
current hypothesis can be used to determine the best can&evolve&across&time.&
system, given the observed evidence Ot at any time t.
explanation, given a particular time and an hypothesis &
space. We next demonstrate how considering sets of
hypotheses and data across time, using the most likely
hypothesis at time t, can produce a greatest likelihood
hypothesis.
gl(hi|Et) = argmax(hi) p(Et|hi) p(hi)
In this model the most likely interpretation of new data,
given evidence E at time t, is a function of which
interpretation is most likely to produce that evidence at
time t and the probability of that interpretation itself
occurring. If we want to expand this to the next time
period, t + 1, we need to describe how models can evolve
across time.
As an example of argmax processing, Chakrabarti et al.
(2005, 2007) analyze a continuous data stream from a set
of distributed sensors. The running “health” of the
transmission of a Navy helicopter rotor system is
represented by a steady stream of sensor data. This data
consists of temperature, vibration, pressure, and other
measurements reflecting the state of the various
components of the running transmission system. An
example of this data can be seen in the top portion of &&
Figure 3, where the continuous data stream is broken into
discrete and partial time slices. Figure Figure&2.&A&Bayesian&belief&network&representing&the&causal&relationships&and&data&points&implicit&in&
2. A Bayesian belief network representing the
A Fourier transform is then used to translate these causal the&discrete&component&semiconductor&domain.&As&data&is&“discovered”&the&(a&priori)&probabilistic&
relationships and data points implicit in the discrete
signals into the frequency domain, as shown on the left hypotheses&change&and&suggest&further&search&for&data.&
component semiconductor domain. As data is “discovered”
side of the second row of Figure 3. These frequency the (a priori) probabilistic hypotheses change and suggest
readings were compared across time periods to diagnose As&an&example&of&argmax&processing,&Chakrabarti&et&al.&(2005,&2007)&analyze&a&continuous&data&stream&from&
further search for data.
a&set&of&distributed&sensors.&The&running&“health”&of&the&transmission&of&a&Navy&helicopter&rotor&system&is&
represented&by&a&steady&stream&of&sensor&data.&This&data&consists&of&temperature,&vibration,&pressure,&and&
other&measurements&reflecting&the&state&of&the&various&components&of&the&running&transmission&system.&An&
example&of&this&data&can&be&seen&in&the&top&portion&of&Figure&3,&where&the&continuous&data&stream&is&broken&
into&discrete&and&partial&time&slices.&
A&Fourier&transform&is&then&used&to&translate&these&signals&into&the&frequency&domain,&as&shown&on&the&
figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the
domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&syst
&
& Figure 4. The data of Figure 3 is processed using an auto- &
regressive hidden Markov model. States Ot represent the
observable values at time t. The St states represent the
Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&m
hidden “health” states of the rotor system, {safe, unsafe,
faulty} at time t.
represent&the&observable&values&at&time&t.&The&St&states&represent&the&hidden&“health”&
system,Conclusion:
&{safe, unsafe, An at&time&t.&
faulty}&epistemological stance
& Turing’s test for intelligence was agnostic both as to what a
Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of& 3.'Conclusion:'An'Epistemological'Stance''
computer was composed of – vacuum tubes, transistors, or
tinker toys - as well as to the languages used to make it
the&figure&presents&the&original&data&stream&(left)&and&an&enlarged&time&slice&(right).&The&lower&left& run. It simply required the responses of the machine to be
figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the&frequency&
Turing’s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&o
roughly equivalent to the responses of humans in the same
domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&system.&
& it&run.&It&simply&re
situations.
& Modern AI research has proposed probabilistic
transistors, &or&tinker&toys&S&as&well&as&to&the&languages&used&to&make&
representations and algorithms for the real-time integration
of new (a posteriori) information into previously (a priori)
of&the&machine&to&be&roughly&equivalent&to&the&responses&of&humans&in&the&same&situat
learned patterns of information (Dempster 1968). Among
Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of& these algorithms is loopy belief propagation (Pearl 1988,
2000) that captures a system of plausible beliefs constantly
the&figure&presents&the&original&data&stream&(left)&Modern& and&an&AeI&nlarged&
research&thime& slice&(right).
as&proposed& &The&lower&
probabilistic& left& and&algorithms&for&th
representations&
iterating towards equilibrium, or equilibration, as Piaget
might describe it. A cognitive system can be in a priori
figure&is&the&result&of&the&Fourier&t&ransform&of&the&of&time& new&s(a&lice& data&(transformed)&
information&into&into& the&frequency&
equilibrium with its continuing states of learned diagnostic
posteriori)& previously&
knowledge. When presented with novel information (a&priori)&learned&patterns&of&inform
Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&model.&States&O & characterizing a new diagnostic situation, this a posteriori
domain.
represent& &The&values&lower&
the&observable& at&time&t.&The&right&
S &states&frigure&
epresent&the&rhepresents&
idden&“health”&states&tohe& 1968).s&tates&
f&the&rhotor&idden&
t
Among&otf&hese&
the&halgorithms&
elicopter&is&rotor&
loopy,sbystem.
elief,propagation&
&
data perturbs the equilibrium. The cognitive system then (Pearl&1988,&2000)&that&cap
t
& infers, using prior and posterior components of the model,
system,&{safe, unsafe, faulty}&at&time&t.&
Figure 3. Real-time data from the transmission system of a
plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&m
until it finds convergence or equilibrium, in the form of a
3.'helicopter’s
Conclusion:'An'Epistemological' Stance'' particular greatest likelihood hypothesis.
mission&system&of&a&helicopter’s&rotor.&The&top&component&of& cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&dia
rotor. The top component of the figure The claim of this paper is that stochastic methods offer a
presents the original data stream (left) and an enlarged time
Turing’ s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&of&–&vacuum&tubes,&
am&(left)&and&an&enlarged&time&slice&(right).&The&lower&left& sufficient account of human intelligence in areas such as
slice (right). The lower left figure is the result of the
transistors, &or&ttransform
inker&toys&S&as&well&ofas&tthe
o&the<ime
anguages&slice
used&to&mdata
ake&it&r(transformed) When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&thi
un.&It&simply&required&the&into
responses& diagnostic reasoning. This includes the computation of a
rm&of&the&time&slice&data&(transformed)&into&the&frequency&
Fourier greatest likelihood measure of hypotheses, given new
of&the
the&machine&
frequencyto&be&roughly&domain.
equivalent&to&tThe
ts&the&hidden&states&of&the&helicopter&rotor&system.&
the hidden states of the helicopter rotor system.
he&responses&
lower of&humans&
rightin&the&figure
perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&c
same&situations. &
represents information and an expert’s a priori cognitive equilibrium.
Further, we contend that the greatest likelihood calculation
model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&lik
Modern&AI&research&has&proposed&probabilistic&representations&and&algorithms&for&the&realStime&integration& is cognitively plausible and offers an epistemological
framework for understanding the phenomena of human
of&new&(a&posteriori)&information&into&previously&(a&priori)&learned&patterns&of&information&(Dempster&
diagnostic and prognostic reasoning.
1968).&Among&these&algorithms&is&loopy,belief,propagation&(Pearl&1988,&2000)&that&captures&a&system&of&
The claim of this paper is that stochastic methods offer a sufficient account of hum
plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&might&describe&it.&A&
cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&diagnostic&knowledge.&
areas such as diagnostic reasoning. This includes the computation of a greatest like
When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&this&a&posteriori&data&
perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&components&of&the&
hypotheses, given new information
model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&likelihood&hypothesis.& & and an expert’s a priori cognitive equilibrium. F
The claim of this paper is that stochastic methods offer a sufficient account of human intelligence inthat the greatest likelihood calculation is cognitively plausible and offers an epistem
&
Figure&4.&The&data&of&theFcomputation
areas such as diagnostic reasoning. This includes
igure&3&ofis&a greatest
processed& using&ofan&understanding
likelihood measure for
autoSregressive& hidden&Markov&
the phenomena model.
of human &States&andOt&prognostic reasoning.
diagnostic
References Piaget, J. (1954), The Construction of Reality in the Child, New
York: Basic Books.
Bartlett, F. (1932), Remembering, London: Cambridge University Piaget, J. (1970), Structuralism, New York: Basic Books.
Press.
Simon, H. A. (1981), The Sciences of the Artificial (2nd ed),
Bayes, T. (1763), Essay Towards Solving a Problem in the Cambridge MA: MIT Press.
Doctrine of Chances, Philosophic Transactions of the Royal
Stern, C.R. and Luger, G. F. (1997), Abduction and Abstraction
Society of London, London: The Royal Society, pp 370-418.
in Diagnosis: A Schema Based Account. In Android
Chakrabarti, C., Rammohan, R., and Luger, G. F. (2005), ‘A Epistemology, Ford et al. eds, Cambridge MA: MIT Press.
First-Order Stochastic Modeling Language for Diagnosis’
Turing, A. (1950), ‘Computing Machinery and Intelligence’,
Proceedings of the 18th International Florida Artificial
Mind, 59, 433-460.
Intelligence Research Society Conference, (FLAIRS-18). Palo
Alto: AAAI Press. von Glaserfeld, E. (1978), ‘An Introduction to Radical
Constructivism’, The Invented Reality, Watzlawick, ed., pp17-40,
Chakrabarti, C., Pless, D. J., Rammohan, R., and Luger, G. F.
New York: Norton.
(2007), ‘Diagnosis Using a First-Order Stochastic Language That
Learns’, Expert Systems with Applications. Amsterdam: Elsevier
Press. 32 (3).
Dempster, A.P. (1968), ‘A Generalization of Bayesian Inference’,
Journal of the Royal Statistical Society, 30 (Series B): 1-38.
Glymour, C. (2001). The Mind's Arrows: Bayes Nets and
Graphical Causal Models in Psychology, MIT Press, 2001
Gopnik, A., Glymour, C., Sobel, D.M., Schulz, L.E., Kushnir, T.
and Danks, D., 2004. A theory of causal learning in children:
Causal maps and Bayes nets. Psychological Review, 111(1): 3-32.
Gopnik, A. (2011a). A unified account of abstract structure and
conceptual change: Probabilistic models and early learning
mechanisms. Commentary on Susan Carey "The Origin of
Concepts." Behavioral and Brain Sciences 34,3:126-129.
Gopnik, A. (2011b). Probabilistic models as theories of children's
minds. Behavioral and Brain Sciences 34,4:200-201.
Hume, D. (1739/2000). A Treatise of Human Nature, edited by D.
F. Norton and M. J. Norton, Oxford/New York: Oxford
University Press.
Jurasky, D. and Martin, J. M. (2009), Speech and Language
Processing, Upper Saddle River NJ: Pearson Education.
Kant, I. (1781/1964), Immanuel Kant’s Critique of Pure Reason,
Smith, N.K., translator, New York: St. Martin’s Press.
Kushnir, T. Gopnik, A. Lucas, C. and Schulz L. (2010). Inferring
hidden causal structure. Cognitive Science 34:148-160.
Luger, G. F. (2009), Artificial Intelligence: Structures and
Strategies for Complex Problem Solving, 6th edition, Boston:
Addison-Wesley Pearson Education.
Luger, G. F. (2012), Epistemology, Access, and Computational
Models. In The Complex Mind, McFarland, Stenning, and
McGonigle Chalmbers, editors. London: Palgrave Macmillan.
Luger, G. F. and Chakrabarti, C. (2014), From Alan Turing to
Modern AI: An Epistemological Stance (in submission), copies
available from first author.
Luger, G.F., Lewis, J.A., and Stern, C. (2002), ‘Problem Solving
as Model-Refinement: Towards a Constructivist Epistemology’,
Brain, Behavior, and Evolution, Basil: Krager, 59: 87-100.
Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference, Los Altos CA: Morgan
Kaufmann.
Pearl, J. (2000), Causality, Cambridge UK: Cambridge
University Press.
Peirce, C.S. (1958), Collected Papers 1931 – 1958, Cambridge
MA: Harvard University Press.