=Paper= {{Paper |id=Vol-1144/paper13 |storemode=property |title=Modern AI: Stochastic Models and an Epistemological Stance |pdfUrl=https://ceur-ws.org/Vol-1144/paper13.pdf |volume=Vol-1144 |dblpUrl=https://dblp.org/rec/conf/maics/LugerC14 }} ==Modern AI: Stochastic Models and an Epistemological Stance== https://ceur-ws.org/Vol-1144/paper13.pdf
computers that understand human languages, and various            Perceived information, Kant’s a posteriori knowledge,
forms of web agents.                                              rarely fits precisely into our preconceived and a priori
   The failure, however, of computers to succeed at the           schemata. From this tension to comprehend and act as an
task of creating a general-purpose thinking machine begins        agent, the schema-based biases a subject uses to organize
to shed some understanding on the “failures” of the               experience are strengthened, modified, or replaced. This
imitation game itself. Specifically, the imitation game           accommodation in the context of unsuccessful interactions
offers no hint of a definition of intelligent activity nor does   with the environment drives a process of cognitive
it offer specifications for building intelligent artifacts.       equilibration. The constructivist epistemology is one of
Deeper issues remain that Turing did not address. What IS         cognitive evolution and continuous model refinement. An
intelligence? What IS grounding (how may a human’s or             important consequence of constructivism is that the
computer’s statements be said to have “meaning”)?                 interpretation of any perception-based situation involves
Finally, can humans understand their own intelligence in a        the imposition of the observers (biased) concepts and
manner sufficient to formalize or replicate it?                   categories on what is perceived. This constitutes an
   This paper considers these issues, especially the              inductive bias (Luger 2009, Ch 16).
responses to the challenge of building intelligent artifacts      When Piaget (1970) proposed a constructivist approach to
that the modern artificial intelligence community has             a child’s understanding the external world, he called it a
taken. In earlier venues (Luger 2012, Luger and                   genetic epistemology. When encountering new phenomena,
Chakrabarti 2014) we presented general issues of artificial       the lack of a comfortable fit of current schemata to the
intelligence artifacts and the epistemological biases they        world “as it is” creates a cognitive tension. This tension
embody. In this paper we view modern artificial                   drives a process of schema revision. Schema revision,
intelligence, especially the commitment to stochastic             Piaget’s accommodation, is the continued evolution of the
models, and the epistemological stance that supports this         agent’s understanding towards equilibration.
approach. In the next section we present a constructivist            There is a blending here of empiricist and rationalist
rapprochement of the empiricist, rationalist, and pragmatist      traditions, mediated by the pragmatist requirement of agent
positions that supported early AI work and addresses many         survival. As embodied, agents can comprehend nothing
of its dualist assumptions. We also offer some preliminary        except that which first passes through their senses. As
conjectures about how a Bayesian model might be                   accommodating, agents survive through learning the
epistemologically plausible.                                      general patterns of an external world. What is perceived is
                                                                  mediated by what is expected; what is expected is
                                                                  influenced by what is perceived: these two functions can
         Modern AI: Probabilistic models                          only be understood in terms of each other. A Bayesian
We view a constructivist and model-revising epistemology          model-refinement representation offers an appropriate
as a rapprochement between the empiricist, rationalist, and       model for critical components of this constructivist model-
pragmatist viewpoints. The constructivist hypothesizes that       revising epistemological stance (Luger et al. 2002, Luger
all human understanding is the result of an interaction           2012). Interestingly enough, David Hume acknowledged
between energy patterns in the world and mental categories        the epistemic foundation of all human activity (including,
imposed on the world by an intelligent agent (Piaget 1954,        of course, the construction of AI artifacts) in A Treatise on
1970; von Glasersfeld 1978). Using Piaget’s terms, we             Human Nature (1739/2000) when he stated “All the
humans assimilate external phenomena according to our             sciences have a relation, greater or less, to human nature;
current understanding and accommodate our understanding           and ... however wide any of them may seem to run from it,
to phenomena that do not meet our prior expectations.             they still return back by one passage or another. Even
Constructivists use the term schemata to describe the a           Mathematics, Natural Philosophy, and Natural Religion,
priori structure used to mediate the experience of the            are in some measure dependent on the science of MAN;
external world. The term schemata is taken from the               since they lie under the cognizance of men, and are judged
writing of the British psychologist Bartlett (1932) and its       of by their powers and faculties.”
philosophical roots go back to Kant (1781/1964). On this             Thus, we can ask why a constructivist epistemology
viewpoint observation is not passive and neutral but active       might be useful in addressing the problem of building
and interpretative. There are many current psychologists          programs that are “intelligent”. How can an agent within
and philosophers that support and expand this pragmatic           an environment understand its own understanding of that
and teleological account of human developmental activity          situation? We believe that constructivism also addresses
(Glymour 2001, Gopnik et al. 2010, Gopnik 2011a, 2011b,           this problem of epistemological access. For more than a
Kushnir et al. 2010).                                             century there has been a struggle in both philosophy and
                                                                  psychology between two factions: the positivist, who
                                                                  proposes to infer mental phenomena from observable
    physical behavior, and a more phenomenological approach              To describe further the pieces of Bayes formula: The
    which allows the use of first person reporting to enable          probability of an hypothesis being true, given a set of
    access to cognitive phenomena. This factionalism exists           evidence, is equal to the probability that the evidence is
    because both modes of access to cognitive phenomena               true given the hypothesis times the probability that the
    require some form of model construction and inference.            hypothesis occurs. This number is divided (normalized) by
       In comparison to physical objects like chairs and doors,       the probability of the evidence itself, p(E). This probability
    which often, naively, seem to be directly accessible, the         of evidence is represented as the sum over all hypotheses
    mental states and dispositions of an agent seem to be             presenting the evidence times the probability of that
    particularly difficult to characterize. We contend that this      hypothesis itself.
    dichotomy between direct access to physical phenomena                There are limitations to using Bayes’ theorem as just
    and indirect access to mental phenomena is illusory. The          presented as an epistemological characterization of the
    constructivist analysis suggests that no experience of the        phenomenon of interpreting new (a posteriori) data in the
    external (or internal) world is possible without the use of       context of (prior) collected knowledge and experience.
    some model or schema for organizing that experience. In           First, of course, is the fact that the epistemological subject
    scientific enquiry, as well as in our normal human                is not a calculating machine. We simply don’t have all the
    cognitive experiences, this implies that all access to            prior (numerical) values for all the hypotheses and
    phenomena is through exploration, approximation, and              evidence that can fit a problem situation. In a complex
    continued model refinement.                                       situation such as medicine where there can be hundreds of
       Bayes theorem (1763) offers a plausible model of this          hypothesized diseases and thousands of symptoms, this
    constructivist rapprochement between the philosophical            calculation is intractable (Luger 2009, Chapter 5).
    traditions we have just discussed. It is also an important           A second objection is that in most situations the sets of
    modeling tool for much of modern AI, including AI                 evidence are NOT independent, given the set of
    programs for natural language understanding, robotics, and        hypotheses. This makes the calculation of p(E) in the
    machine learning. With a high-level discussion of Bayes’          denominator of Bayes rule as just presented unjustified.
    insights, we can describe the power of this approach.             When this independence assumption is simply ignored, as
       Consider the general form of Bayes’ relationship used to       we see shortly, the result is called naïve Bayes. More often,
    determine the probability of a particular hypothesis, hi,         however, the rationalization of the probability of the
    given a set of evidence E:                                        occurrence of evidence across all hypotheses is seen as
                                                                      simply a normalizing factor, supporting the calculation of a
                          p(E | h i )p(h i )                          realistic measure for the probability of the hypothesis given
       p(h i | E) =   n
                                                                      the evidence (the left side of Bayes’ equation). The same
                      ∑ p(E | h )p(h )
                                    k      k                          normalizing factor is utilized in determining the actual
                      k=1
                                                                      probability of any of the hi, given the evidence, and thus,
                                                                      as in most natural language processing applications, is
    	
  
                                                                      usually ignored.
    p(hi|E) is the probability that a particular hypothesis, hi, is
€                                                                        A final objection asserts that diagnostic reasoning is not
    true given evidence E.
                                                                      about the calculation of probabilities; it is about
    p(hi) is the probability that hi is true overall.
                                                                      determining the most likely explanation, given the
    p(E|hi) is the probability of observing evidence E when hi
                                                                      accumulation of pieces of evidence. Humans are not doing
    is true.
                                                                      real-time complex mathematical processing; rather we are
    n is the number of possible hypotheses.
                                                                      looking for the most coherent explanation or possible
                                                                      hypothesis, given the amassed data. Thus, a much more
       With the general form of Bayes’ theorem we have a
                                                                      intuitive form of Bayes rule ignores this p(E ) denominator
    functional (and computational!) description (model) for a
                                                                      entirely (as well as the associated assumption of evidence
    particular situation happening given a set of perceptual
                                                                      independence). The resulting formula determines the
    evidence clues. Epistemologically, we have created on the
                                                                      likelihood of any hypothesis given the evidence, as the
    right hand size of the equation a schema describing how
                                                                      product of the probability of the evidence given the
    prior accumulated knowledge of occurrences of
                                                                      hypothesis times the probability of the hypothesis itself
    phenomena can relate to the interpretation of a new
                                                                      p(E|hi) p(hi). In most diagnostic situations we are asked to
    situation, the left hand side of the equation. This
                                                                      determine which of a set of hypotheses hi is most likely to
    relationship can be seen as an example of Piaget’s
                                                                      be supported. We refer to this as determining the argmax
    assimilation where encountered information fits (is
                                                                      across all the set of hypotheses. Thus, if we wish to
    interpreted by) the patterns created from prior experiences.
                                                                      determine which of all the hi has the most support we look
                                                                      for the largest p(E|hi) p(hi):
                                                                                                      We&next&illustrate&the&Bayesian&approach&in&two&application&doma
                                                                                                      discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabar
             argmax(hi) p(E|hi) p(hi)                                                                 creating&the&greatest&likelihood&for&hypotheses&across&expanding&
                                                                                                  expert,    the presence of a break supports a number of
                                                                                                      Figure&1,&presenting&two&failures&of&discrete&component&semicond
                                                                                                  alternative   hypotheses. The search for the most likely
   In a dynamic interpretation, as sets of evidence                                                   “open”,&or&the&break&in&a&wire&connecting&components&to&others&in
                                                                                                  explanation    for a failure broadens the evidence search:
themselves change across time, we will call this argmax of                                        Howthe&presence&of&a&break&supports&a&number&of&alternative&hypothe
                                                                                                         large is the break? Is there any discoloration related
hypotheses given a set of evidence at a particular time the                                           explanation&for&a&failure&broadens&the&evidence&search:&How&large
                                                                                                  to the  break? Were there any (perceptual) sounds or smells
greatest likelihood of that hypothesis at that time. We show                                      whenrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smel
                                                                                                          it happened? What were the resulting conditions of
this relationship, an extension of the Bayesian maximum a                                             resulting&conditions&of&the&components&of&the&system?
                                                                                                  the components    of the system?                              &
posteriori (or MAP) estimate, as a dynamic measure over
time t:


             gl(hi|Et) = argmax(hi) p(Et|hi) p(hi)

   This model is both intuitive and simple: the most likely
interpretation of new data, given evidence E at time t, is a
                We&next&illustrate&the&Bayesian&approach&in&two&application&domains.&In&the&diagnosis&of&failures&in&
function of which interpretation is most likely to produce
                discrete&component&semiconductors&(Stern&et&al.&1997,&Chakrabarti&et&al.&2005)&we&have&an&example&of&
that evidence at time t and the probability of that
                creating&the&greatest&likelihood&for&hypotheses&across&expanding&data&sets.&Consider&the&situation&of&
interpretation itself occurring.
                Figure&1,&presenting&two&failures&of&discrete&component&semiconductors.&The&failure&type&is&called&an&
   By the early 1990s, much of computation-based
                “open”,&or&the&break&in&a&wire&connecting&components&to&others&in&the&system.&For&the&diagnostic&expert,&
language understanding        and generation was stochastic,
                the&presence&of&a&break&supports&a&number&of&alternative&hypotheses.&The&search&for&the&most&likely&
including parsing,       part-of-speech     tagging, reference
                explanation&for&a&failure&broadens&the&evidence&search:&How&large&is&the&break?&Is&there&any&discoloration&
resolution, andrelated&to&the&break?&Were&there&any&(perceptual)&sounds&or&smells&when&it&happened?&What&were&the&
                  discourse processing, usually using tools                                                                                                                               &
like greatest likelihood      measures   (Jurafsky    and       Martin
                resulting&conditions&of&the&components&of&the&system?
2009). Other areas of artificial intelligence, especially                                          & &              &         &     (a)a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.&
machine learning, became more Bayesian-based. In many                                                               Figure&1.&Two&examples&of&discrete&component&semiconductors
ways these uses of stochastic technology for pattern                                                                “connection&broken”&failure.&
recognition were another instantiation of the constructivist                                                        &
tradition, as collected sets of patterns were used to                                              Driven&by&the&data&search&supporting&multiple&possible&hypothese
condition recognition of new patterns.                                                             notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&Th
   Judea Pearl’s (1988) proposal for use of Bayesian belief                                        likelihood&hypothesis&that&explains&the&open&as&a&break&created&by
nets (BBNs) and his assumption of their links reflecting                                           caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&g
“causal” relationships (Pearl 2000) brought the use of                                             the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&mel
Bayesian technology to an entirely new importance. First,                                          these&diagnostic&scenarios&have&been&implemented&by&an&expert&s
the assumption of these networks being directed graphs –                                           hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayes
reflecting causal relationships – and disallowing cycles –                                         Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and
no entity can cause itself – brought a radical improvement
to the computational costs of reasoning with BBNs (Luger                                           The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&exp
2009, Ch 9). Second, these same two assumptions made                                               domain.&In&fact,&these&networks&of&causal&relationships&are&usually
                                                                                              & working&with&human&experts’&analysis&of&known&failures.&Thus,&th                          &
the BBN representation much more transparent as a
representational& tool that                                                                        expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&p
                          & could& capture a.&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&b.&
                                               causal relations.                                                                    (b)
                                                                                                   e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&norma
Finally, most all the traditional powerful stochastic
                                                                                                   most&likely&explanation,&within&its&(a&priori)&model,&given&this&new
                          Figure&1.&Two&examples&of&discrete&component&semiconductors,&each&exhibiting&the&“open”&or&
representations used in language work and machine                                          Figure             1. Two examples of discrete component
                          “connection&broken”&failure.&                                            rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&o
learning, for example, & the hidden Markov model in the                                    semiconductors,              each exhibiting the “open” or “connection
                                                                                                   one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypo
form of a dynamic       Bayesian network (DBN), could be
                Driven&by&the&data&search&supporting&multiple&possible&hypotheses&that&can&explain&the&“open”,&the&expert&
                                                                                           broken”           failure.
                                                                                                   likelihood&measures&decrease&within&the&BBN.&
readily integrated  into this new representational formalism.
                notes&the&bambooing&effect&in&the&disconnected&wire,&Figure&1a.&This&suggests&a&revised&greatest&
   We next illustrate       the Bayesian approach in two
                likelihood&hypothesis&that&explains&the&open&as&a&break&created&by&metal&crystallization&that&was&likely&
                                                                                                Driven          by the data search supporting multiple possible
                                                                                                   This¤t&example&demonstrates&how&the&most&likely¤t&h
application domains.    In the diagnosis of failures in discrete
                caused&by&a&sequence&of&lowSfrequency&highScurrent&pulses.&The&greatest&likely&hypothesis&for&the&open&of&
                                                                                           hypotheses             that can explain the “open”, the expert notes
                                                                                                   best&explanation,&given&a&particular&time&and&an&hypothesis&space
component semiconductors        (Stern et al. 1997, Chakrabarti
                the&example&of&Figure&1b,&where&the&break&is&seen&as&balled,&is&melting&due&to&excessive¤t.&Both&of&
                                                                                           the sets&of&hypotheses&and&data&across&time,&using&the&most&likely&hyp
                                                                                                   bambooing effect in the disconnected wire, Figure 1a.
et al. 2005) wethese&diagnostic&scenarios&have&been&implemented&by&an&expert&systemSlike&search&through&an&
                   have an example of creating the greatest                                Thislikelihood&hypothesis.&
                                                                                                      suggests a revised greatest likelihood hypothesis that
likelihood for hypothesis&space&(Stern&et&al.&1997)&as&well&as&reflected&in&a&Bayesian&belief&net&(Chakrabarti&et&al.&2005).&&
                  hypotheses across expanding data sets.                                   explains the open as a break created by metal
Consider the situation    of Figure 1, presenting two failures                                     gl(hi|Et) =that
                Figure&2&presents&a&Bayesian&belief&net&(BBN)&capturing&these&and&other&related&diagnostic&situations.&
                                                                                           crystallization             argmax(h   i) p(E
                                                                                                                          was likely      t|hi) by
                                                                                                                                      caused        p(hai)&sequence of low-
of discrete component semiconductors. The failure type is                                  frequency high-current pulses. The greatest likely
                The&BBN,&without&new&data,&represents&the&a&priori&state&of&an&expert’s&knowledge&of&an&application&
called an “open”,      or the break in a wire connecting                                   hypothesis for the open of the example of Figure 1b, where
components todomain.&In&fact,&these&networks&of&causal&relationships&are&usually&carefully&crafted&through&many&hours&
                  others in the system. For the diagnostic                                 the break is seen as balled, is melting due to excessive
                working&with&human&experts’&analysis&of&known&failures.&Thus,&the&BBN&can&be&said&to&capture&a&priori&
                expert&knowledge&implicit&in&a&domain&of&interest.&When&new&(a&posteriori)&data&are&given&to&the&BBN,&
                e.g.,&the&wire&is&“bambooed”,&the&color&of&the&copper&wire&is&normal,&etc,&the&belief&network&“infers”&the&
                most&likely&explanation,&within&its&(a&priori)&model,&given&this&new&information.&There&are&many&inference&
                rules&for&doing&this&(Luger&2009,&Chapter&9).&An&important&result&of&using&the&BBN&technology&is&that&as&
                one&hypothesis&achieves&its&greatest&likelihood,&other&related&hypotheses&are&“explained&away”,&i.e.,&their&
                likelihood&measures&decrease&within&the&BBN.&
current. Both of these diagnostic scenarios have been            the running health of the rotor system. The model used to
implemented by an expert system-like search through an           diagnose rotor health the auto-regressive hidden Markov
hypothesis space (Stern et al. 1997) as well as reflected in a   model (A-RHMM) of Figure 4. The observable states of
Bayesian belief net (Chakrabarti et al. 2005). Figure 2          the system are made up of the sequences of the segmented
presents a Bayesian belief net (BBN) capturing these and         signals in the frequency domain while the hidden states are
other related diagnostic situations.                             the imputed health states of the helicopter rotor system
   The BBN, without new data, represents the a priori state      itself, as seen in the lower right of Figure 3.
of an expert’s knowledge of an application domain. In fact,           The hidden Markov model (HMM) technology is an
these networks of causal relationships are usually carefully     important stochastic technique that can be seen as a variant
crafted through many hours working with human experts’           of a dynamic BBN. In the HMM, we attribute values to
analysis of known failures. Thus, the BBN can be said to         states of the network that are themselves not directly
capture a priori expert knowledge implicit in a domain of        observable. For example, the HMM technique is widely
interest. When new (a posteriori) data are given to the          used in the computer analysis of human speech, trying to
BBN, e.g., the wire is “bambooed”, the color of the copper       determine the most likely word uttered, given a stream of
wire is normal, etc, the belief network “infers” the most        acoustic signals (Jurasky and Martin 2009). In our
likely explanation, within its (a priori) model, given this      helicopter example, training this system on streams of
new information. There are many inference rules for doing        normal transmission data allowed the system to make the
this (Luger 2009, Chapter 9). An important result of using       correct greatest likelihood measure of failure when these
the BBN technology is that as one hypothesis achieves its        signals change to indicate a possible breakdown. The US
greatest likelihood, other related hypotheses are “explained     Navy supplied data to train the normal running system as
away”, i.e., their likelihood measures decrease within the       In&this&model&the&most&likely&interpretation&of&new&data,&given&evidence&E&at&time&t,&is&a&function&of&which&
                                                                 well       data sets for transmissions that contained seeded
BBN.                                                             interpretation&is&most&likely&to&produce&that&evidence&at&time&t&and&the&probability&of&that&interpretation&
                                                                 faults.      Thus, the hidden state St of the A-RHMM reflects
   This current example demonstrates how the most likely         itself&occurring.&If&we&want&to&expand&this&to&the&next&time&period,&t'+'1,&we&need&to&describe&how&models&
                                                                 the     greatest likelihood hypothesis of the state of the rotor
current hypothesis can be used to determine the best             can&evolve&across&time.&
                                                                 system,        given the observed evidence Ot at any time t.
explanation, given a particular time and an hypothesis           &
space. We next demonstrate how considering sets of
hypotheses and data across time, using the most likely
hypothesis at time t, can produce a greatest likelihood
hypothesis.

gl(hi|Et) = argmax(hi) p(Et|hi) p(hi)

   In this model the most likely interpretation of new data,
given evidence E at time t, is a function of which
interpretation is most likely to produce that evidence at
time t and the probability of that interpretation itself
occurring. If we want to expand this to the next time
period, t + 1, we need to describe how models can evolve
across time.
   As an example of argmax processing, Chakrabarti et al.
(2005, 2007) analyze a continuous data stream from a set
of distributed sensors. The running “health” of the
transmission of a Navy helicopter rotor system is
represented by a steady stream of sensor data. This data
consists of temperature, vibration, pressure, and other
measurements reflecting the state of the various
components of the running transmission system. An
example of this data can be seen in the top portion of                                                                                                                    &&
Figure 3, where the continuous data stream is broken into
discrete and partial time slices.                                Figure  Figure&2.&A&Bayesian&belief&network&representing&the&causal&relationships&and&data&points&implicit&in&
                                                                               2. A Bayesian belief network representing the
   A Fourier transform is then used to translate these           causal  the&discrete&component&semiconductor&domain.&As&data&is&“discovered”&the&(a&priori)&probabilistic&
                                                                             relationships and data points implicit in the discrete
signals into the frequency domain, as shown on the left                  hypotheses&change&and&suggest&further&search&for&data.&
                                                                 component             semiconductor domain. As data is “discovered”
side of the second row of Figure 3. These frequency              the (a priori) probabilistic hypotheses change and suggest
readings were compared across time periods to diagnose           As&an&example&of&argmax&processing,&Chakrabarti&et&al.&(2005,&2007)&analyze&a&continuous&data&stream&from&
                                                                 further      search for data.
                                                                 a&set&of&distributed&sensors.&The&running&“health”&of&the&transmission&of&a&Navy&helicopter&rotor&system&is&
                                                                 represented&by&a&steady&stream&of&sensor&data.&This&data&consists&of&temperature,&vibration,&pressure,&and&
                                                                 other&measurements&reflecting&the&state&of&the&various&components&of&the&running&transmission&system.&An&
                                                                 example&of&this&data&can&be&seen&in&the&top&portion&of&Figure&3,&where&the&continuous&data&stream&is&broken&
                                                                 into&discrete&and&partial&time&slices.&

                                                                 A&Fourier&transform&is&then&used&to&translate&these&signals&into&the&frequency&domain,&as&shown&on&the&
                                                                                                                                                   figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the
                                                                                                                                                   domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&syst




                                                                                                                                                                                                                      &
                                                                                                                                            &      Figure 4. The data of Figure 3 is processed using an auto-                   &
                                                                                                                                                   regressive hidden Markov model. States Ot represent the
                                                                                                                                                   observable values at time t. The St states represent the
                                                                                                                                                    Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&m
                                                                                                                                                   hidden “health” states of the rotor system, {safe, unsafe,
                                                                                                                                                   faulty} at time t.
                                                                                                                                                    represent&the&observable&values&at&time&t.&The&St&states&represent&the&hidden&“health”&
                                                                                                                                                    system,Conclusion:
                                                                                                                                                            &{safe, unsafe, An        at&time&t.&
                                                                                                                                                                             faulty}&epistemological       stance
                                                                                                                                        &          Turing’s test for intelligence was agnostic both as to what a
                          Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of&              3.'Conclusion:'An'Epistemological'Stance''
                                                                                                                                                   computer was composed of – vacuum tubes, transistors, or
                                                                                                                                                   tinker toys - as well as to the languages used to make it
                          the&figure&presents&the&original&data&stream&(left)&and&an&enlarged&time&slice&(right).&The&lower&left&                  run. It simply required the responses of the machine to be
                          figure&is&the&result&of&the&Fourier&transform&of&the&time&slice&data&(transformed)&into&the&frequency&
                                                                                                                                            Turing’s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&o
                                                                                                                                                   roughly equivalent to the responses of humans in the same
                          domain.&The&lower&right&figure&represents&the&hidden&states&of&the&helicopter&rotor&system.&
                                                                                                                                                                                                                 & it&run.&It&simply&re
                                                                                                                                                   situations.

                                                                                                                                             &        Modern AI research has proposed probabilistic
                                                                                                                                            transistors, &or&tinker&toys&S&as&well&as&to&the&languages&used&to&make&
                                                                                                                                                   representations and algorithms for the real-time integration
                                                                                                                                                   of new (a posteriori) information into previously (a priori)
                                                                                                                                            of&the&machine&to&be&roughly&equivalent&to&the&responses&of&humans&in&the&same&situat
                                                                                                                                                   learned patterns of information (Dempster 1968). Among
                               Figure&3.&RealStime&data&from&the&transmission&system&of&a&helicopter’s&rotor.&The&top&component&of&                these algorithms is loopy belief propagation (Pearl 1988,
                                                                                                                                                   2000) that captures a system of plausible beliefs constantly
                               the&figure&presents&the&original&data&stream&(left)&Modern&                                                       and&an&AeI&nlarged&
                                                                                                                                                             research&thime&  slice&(right).
                                                                                                                                                                          as&proposed&         &The&lower&
                                                                                                                                                                                        probabilistic&        left& and&algorithms&for&th
                                                                                                                                                                                                       representations&
                                                                                                                                                   iterating towards equilibrium, or equilibration, as Piaget
                                                                                                                                                   might describe it. A cognitive system can be in a priori

                               figure&is&the&result&of&the&Fourier&t&ransform&of&the&of&time&                                                     new&s(a&lice& data&(transformed)&
                                                                                                                                                                         information&into&into& the&frequency&
                                                                                                                                                   equilibrium with its continuing states of learned diagnostic
                                                                                                                                                            posteriori)&                    previously&
                                                                                                                                                   knowledge. When presented with novel information       (a&priori)&learned&patterns&of&inform
                           Figure&4.&The&data&of&Figure&3&is&processed&using&an&autoSregressive&hidden&Markov&model.&States&O &                    characterizing a new diagnostic situation, this a posteriori
                               domain.
                           represent&            &The&values&lower&
                                      the&observable&         at&time&t.&The&right&
                                                                              S &states&frigure&
                                                                                          epresent&the&rhepresents&
                                                                                                         idden&“health”&states&tohe&           1968).s&tates&
                                                                                                                                 f&the&rhotor&idden&
                                                                                                                                    t
                                                                                                                                                       Among&otf&hese&
                                                                                                                                                                  the&halgorithms&
                                                                                                                                                                          elicopter&is&rotor&
                                                                                                                                                                                        loopy,sbystem.
                                                                                                                                                                                                elief,propagation&
                                                                                                                                                                                                        &
                                                                                                                                                   data perturbs the equilibrium. The cognitive system then         (Pearl&1988,&2000)&that&cap
                                                                            t
                                                                                                                                               &   infers, using prior and posterior components of the model,
                           system,&{safe, unsafe, faulty}&at&time&t.&
                    Figure 3. Real-time data from the transmission system of a
                                                                                                                 plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&m
                                                                                                                                                   until it finds convergence or equilibrium, in the form of a
                  3.'helicopter’s
                      Conclusion:'An'Epistemological'         Stance''                                                                             particular greatest likelihood hypothesis.
mission&system&of&a&helicopter’s&rotor.&The&top&component&of& cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&dia
                                               rotor. The top component of the figure                                                                 The claim of this paper is that stochastic methods offer a
                    presents the original data stream (left) and an enlarged time
                  Turing’  s&test&for&intelligence&was&agnostic&both&as&to&what&a&computer&was&composed&of&–&vacuum&tubes,&
 am&(left)&and&an&enlarged&time&slice&(right).&The&lower&left&                                                                                     sufficient account of human intelligence in areas such as
                    slice (right). The lower left figure is the result of the
                  transistors,  &or&ttransform
                                     inker&toys&S&as&well&ofas&tthe
                                                                o&the<ime
                                                                       anguages&slice
                                                                                 used&to&mdata
                                                                                           ake&it&r(transformed) When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&thi
                                                                                                   un.&It&simply&required&the&into
                                                                                                                               responses&          diagnostic reasoning. This includes the computation of a
 rm&of&the&time&slice&data&(transformed)&into&the&frequency&
                    Fourier                                                                                                                        greatest likelihood measure of hypotheses, given new
                  of&the
                      the&machine&
                            frequencyto&be&roughly&domain.
                                                    equivalent&to&tThe
 ts&the&hidden&states&of&the&helicopter&rotor&system.&
                    the hidden states of the helicopter rotor system.
                                                                      he&responses&
                                                                             lower  of&humans&
                                                                                          rightin&the&figure
                                                                                                                 perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&c
                                                                                                       same&situations. &
                                                                                                                  represents                       information and an expert’s a priori cognitive equilibrium.
                                                                                                                                                   Further, we contend that the greatest likelihood calculation
                                                                                                                 model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&lik
             Modern&AI&research&has&proposed&probabilistic&representations&and&algorithms&for&the&realStime&integration&                           is cognitively plausible and offers an epistemological
                                                                                                                                                   framework for understanding the phenomena of human
                  of&new&(a&posteriori)&information&into&previously&(a&priori)&learned&patterns&of&information&(Dempster&
                                                                                                                                                   diagnostic and prognostic reasoning.
                  1968).&Among&these&algorithms&is&loopy,belief,propagation&(Pearl&1988,&2000)&that&captures&a&system&of&
                                                                                                                        The claim of this paper is that stochastic methods offer a sufficient account of hum
                  plausible&beliefs&constantly&iterating&towards&equilibrium,&or&equilibration,&as&Piaget&might&describe&it.&A&
                  cognitive&system&can&be&in&a,priori,equilibrium&with&its&continuing&states&of&learned&diagnostic&knowledge.&
                                                                                                                        areas such as diagnostic reasoning. This includes the computation of a greatest like
                  When&presented&with&novel&information&characterizing&a&new&diagnostic&situation,&this&a&posteriori&data&
                  perturbs&the&equilibrium.&The&cognitive&system&then&infers,&using&prior&and&posterior&components&of&the&
                                                                                                                        hypotheses, given new information
                  model,&until&it&finds&convergence&or&equilibrium,&in&the&form&of&a&particular&greatest&likelihood&hypothesis.&                      & and an expert’s a priori cognitive equilibrium. F
                  The claim of this paper is that stochastic methods offer a sufficient account of human intelligence inthat the greatest likelihood calculation is cognitively plausible and offers an epistem
                                                              &
                                Figure&4.&The&data&of&theFcomputation
                  areas such as diagnostic reasoning. This includes
                                                                         igure&3&ofis&a greatest
                                                                                            processed&         using&ofan&understanding
                                                                                                    likelihood measure for
                                                                                                                             autoSregressive&    hidden&Markov&
                                                                                                                                            the phenomena          model.
                                                                                                                                                            of human       &States&andOt&prognostic reasoning.
                                                                                                                                                                       diagnostic
                         References                                  Piaget, J. (1954), The Construction of Reality in the Child, New
                                                                     York: Basic Books.
Bartlett, F. (1932), Remembering, London: Cambridge University       Piaget, J. (1970), Structuralism, New York: Basic Books.
Press.
                                                                     Simon, H. A. (1981), The Sciences of the Artificial (2nd ed),
Bayes, T. (1763), Essay Towards Solving a Problem in the             Cambridge MA: MIT Press.
Doctrine of Chances, Philosophic Transactions of the Royal
                                                                     Stern, C.R. and Luger, G. F. (1997), Abduction and Abstraction
Society of London, London: The Royal Society, pp 370-418.
                                                                     in Diagnosis: A Schema Based Account. In Android
Chakrabarti, C., Rammohan, R., and Luger, G. F. (2005), ‘A           Epistemology, Ford et al. eds, Cambridge MA: MIT Press.
First-Order Stochastic Modeling Language for Diagnosis’
                                                                     Turing, A. (1950), ‘Computing Machinery and Intelligence’,
Proceedings of the 18th International Florida Artificial
                                                                     Mind, 59, 433-460.
Intelligence Research Society Conference, (FLAIRS-18). Palo
Alto: AAAI Press.                                                    von Glaserfeld, E. (1978), ‘An Introduction to Radical
                                                                     Constructivism’, The Invented Reality, Watzlawick, ed., pp17-40,
Chakrabarti, C., Pless, D. J., Rammohan, R., and Luger, G. F.
                                                                     New York: Norton.
(2007), ‘Diagnosis Using a First-Order Stochastic Language That
Learns’, Expert Systems with Applications. Amsterdam: Elsevier
Press. 32 (3).
Dempster, A.P. (1968), ‘A Generalization of Bayesian Inference’,
Journal of the Royal Statistical Society, 30 (Series B): 1-38.
Glymour, C. (2001). The Mind's Arrows: Bayes Nets and
Graphical Causal Models in Psychology, MIT Press, 2001
Gopnik, A., Glymour, C., Sobel, D.M., Schulz, L.E., Kushnir, T.
and Danks, D., 2004. A theory of causal learning in children:
Causal maps and Bayes nets. Psychological Review, 111(1): 3-32.
Gopnik, A. (2011a). A unified account of abstract structure and
conceptual change: Probabilistic models and early learning
mechanisms. Commentary on Susan Carey "The Origin of
Concepts." Behavioral and Brain Sciences 34,3:126-129.
Gopnik, A. (2011b). Probabilistic models as theories of children's
minds. Behavioral and Brain Sciences 34,4:200-201.
Hume, D. (1739/2000). A Treatise of Human Nature, edited by D.
F. Norton and M. J. Norton, Oxford/New York: Oxford
University Press.
Jurasky, D. and Martin, J. M. (2009), Speech and Language
Processing, Upper Saddle River NJ: Pearson Education.
Kant, I. (1781/1964), Immanuel Kant’s Critique of Pure Reason,
Smith, N.K., translator, New York: St. Martin’s Press.
Kushnir, T. Gopnik, A. Lucas, C. and Schulz L. (2010). Inferring
hidden causal structure. Cognitive Science 34:148-160.
Luger, G. F. (2009), Artificial Intelligence: Structures and
Strategies for Complex Problem Solving, 6th edition, Boston:
Addison-Wesley Pearson Education.
Luger, G. F. (2012), Epistemology, Access, and Computational
Models. In The Complex Mind, McFarland, Stenning, and
McGonigle Chalmbers, editors. London: Palgrave Macmillan.
Luger, G. F. and Chakrabarti, C. (2014), From Alan Turing to
Modern AI: An Epistemological Stance (in submission), copies
available from first author.
Luger, G.F., Lewis, J.A., and Stern, C. (2002), ‘Problem Solving
as Model-Refinement: Towards a Constructivist Epistemology’,
Brain, Behavior, and Evolution, Basil: Krager, 59: 87-100.
Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference, Los Altos CA: Morgan
Kaufmann.
Pearl, J. (2000), Causality, Cambridge UK: Cambridge
University Press.
Peirce, C.S. (1958), Collected Papers 1931 – 1958, Cambridge
MA: Harvard University Press.