=Paper= {{Paper |id=Vol-1419/paper0120 |storemode=property |title=Learning of Time Varying Functions is Based on Association Between Successive Stimuli |pdfUrl=https://ceur-ws.org/Vol-1419/paper0120.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/YangL15 }} ==Learning of Time Varying Functions is Based on Association Between Successive Stimuli== https://ceur-ws.org/Vol-1419/paper0120.pdf
 Learning of Time Varying Functions is Based on Association Between Successive
                                   Stimuli
                                            Lee-Xieng Yang (lxyang@nccu.edu.tw)
                           Department of Psychology, Researcher Center for Mind, Brain and Learning
                              National Chengchi University, No.64, Sec.2, ZhiNan Rd., Taipei City
                                                   11605, Taiwan (R.O.C)

                                            Tzu-Hsi Lee (103752010@nccu.edu.tw)
                                     Department of Psychology, National Chengchi University
                                             No.64, Sec.2, ZhiNan Rd., Taipei City
                                                     11605, Taiwan (R.O.C)


                            Abstract                                      theoretical account, a hybrid model combining these two ap-
                                                                          proaches is proposed (McDaniel & Busemeyer, 2005).
  In function learning, the to-be-learned function is normally               Although these models differ on the assumption for the
  designed as time invariant. However, when the magnitudes
  of variable can be defined by time points, the function varies          type of representation formed in function learning, it is basi-
  along time. Due to this difference in essence, the learning of          cally agreed that the representation is formed for the whole
  the time-varying functions would be different from other func-          function. However, contrary to this idea, it was found
  tions. Specifically, the correlation between successive stimuli
  should play an important role for learning such functions. In           that people might form different representations for differ-
  this study, three experiments were conducted with the corre-            ent parts of the function, such that a quadratic function
  lations set as positive high, negative high, and positive low.          was learned as the composition of two simpler monotonic
  The results show people perform well when the correlation
  between successive stimuli is positive high or negative high.           functions, which were chosen for use at different contexts
  Also, people have difficulty learning the time-varying function         (Lewandowsky, Kalish, & Ngang, 2002). The POLE model
  with a low correlation between successive stimuli. A simple             (Kalish, Lewandowsky, & Kruschke, 2004) accounts for this
  two-layered neural network model is evident to be able to pro-
  vide good accounts for the data of all experiments. These re-           finding well, by virtue of its architecture consisting of many
  sults suggest that learning time varying function is based on           modules, each of which represents a linear function corre-
  association between successive stimuli.                                 sponding only to a small region of the function, and a gating
  Keywords: Function Learning; Time Varying Function                      mechanism which always chooses one of the modules for use
                                                                          according to the stimulus value. Strictly speaking, the real
                   Function Learning                                      function is not learned but approximated by the composition
                                                                          of many smaller linear functions.
We are living in an orderly world, in which variables are                    Past studies have tested different functions and shown a
mostly correlated with each other. For instance, the proba-               number of characteristics of function learning. First, the lin-
bility of rain might be a function of the extent to which the             ear functions are easier to learn than the nonlinear ones (see
sky is overcast with dark clouds, or the distance to the car in           Busemeyer et al., 1997; Koh & Meyer, 1991). Second, it is
front needed to avoid a car crash is a function of the current            found that it is more accurate to predict the response for the
car speed. The study of how people learn a function and what              stimulus whose value falls in the training range (i.e., interpo-
people form to represent a learned function is referred to as             lation) than outside the range (i.e., extrapolation) (see Buse-
function learning.                                                        meyer et al., 1997; McDaniel & Busemeyer, 2005). Third,
   There are also two contrasting theoretical accounts in func-           although the function of simpler forms (e.g., linear or power
tion learning. The rule-based account posits that people                  function) can be learned with the variables being of non-
construct abstract rules to summarize the ensemble of ex-                 numeric forms (e.g., line length), Kalish (2013) reported that
perienced pairs of stimuli and responses used to teach the                the periodic functions (e.g., sine function) cannot be learned
function. Most frequently, polynomial rules have been pro-                without the employment of numeric stimuli. These character-
posed as the representations of the mappings between stimu-               istics reveal the limitations of human cognition for learning
lus magnitudes and response magnitudes (see Carroll, 1963;                the functional relation between variables.
Koh & Meyer, 1991). On the contrary, the associative-
based model assumes that people form direct associations be-                              Time-Varying Function
tween each stimulus and corresponding response without ab-                Although many forms of functions have been tested, a partic-
stracting any summary information (Busemeyer, Byun, De-                   ular form of function, which maps the timing of observation
losh, & McDaniel, 1997; DeLosh, Busemeyer, & McDaniel,                    to the event at that timing seems not to have been tested yet.
1997). However, the rule-based account overestimates the                  We call this function as time-varying function in this article,
participants’ performance in the extrapolation test but the               y = f (t). An example of this function would be the height of
associative-based model underestimates it. To get a better                water accumulated in a bucket from a constant supply source.


                                                                    722
If the bucket is cylindrical, the height will be a linear func-           was randomly sampled from the uniform distribution between
tion of time and if the bucket is conical, the height will be             -0.5 and 0.5. All stimulus values were normalized between -
a parabolic function of time. To our knowledge, how people                15 and 15 for the convenience of computer programming. It
learn this kind of function has never been reported in litera-            was reasonable to expect that this function could be learned
ture. However, a relevant case in category learning has been              well, for (1) it was linear as well as (2) the correlation be-
reported recently.                                                        tween successive stimuli was high.
   Navarro and his colleagues tested how people could learn
the categories when the category structure varies along train-
                                                                          Method
ing trials. In one of their experiments, the members of two
categories moved up on the stimulus dimension constantly
along with the increase of trail number and the categorization            Participants and Appartus There were in total 22 partic-
rule was set up as ”Respond A, if xt > t and B otherwise”                 ipants recruited from National Chengchi University in Tai-
for any item xt on trial t. Their results showed that partic-             wan for this experiment. Each participant was reimbursed by
ipants could not only learn this category structure, but also             NTD$ 60 (' US$ 2) for their time and traffic expense. The
be able to predict the item value on the next trial (Navarro              whole experiment was conducted on an IBM compatible PC
& Perfors, 2009, 2012; Navarro, Perfors, & Vong, 2013). It                in a quiet booth. The processes of stimulus displaying and re-
is implied that people are able to capture some functional re-            sponse recording were under the control of a computer script
lationships between the time point (or trial number) and the              composed by PsychoPy (Peirce, 2007).
stimulus value. However, the learning of the time-varying
functions might be different from the normal functions.
                                                                                                       Session 1                                             Session 2
                                                                                       15                                                    15
 Comparison Between Time-Varying Function
           and Normal Function                                                         10                                                    10

There are some features of the time-varying functions worth
                                                                                        5                                                     5
noting. First, due to that time can never return, when learn-
ing a time-varying function, making a prediction for response
                                                                           position




                                                                                                                                 position
                                                                                        0                                                     0
magnitude on each trial is always extrapolating what people
have learned. However, in the case of learning the function                           −5                                                    −5
y = f (x), both the interpolation and extrapolation tests can be
conducted.                                                                            −10                                                   −10
   Second, a time-varying function can be viewed as a func-
tion defining the relationships between successive stimuli,                           −15                                                   −15
                                                                                            1 7 131925313743495561677379859197                    1 7 131925313743495561677379859197
xt = f (xt−1 ). A good example is the game of throwing a Fris-                                             trial                                                 trial
bee with friends. In this case, the only observable information
is the spatial position of the Frisbee at any time point. There-
fore, the best cue for us to estimate the position of the Frisbee         Figure 1: The stimulus structure in Experiment 1 (i.e.,
at time t is its position at time t − 1.                                  crosses) and the participants’ predictions (i.e., circles) in Ses-
   Third, the learnability or complexity of function would be             sion 1 averaged across all participants.
defined differently for the time-varying function. For the case
of y = f (x), the linear function has less parameters to esti-            Procedure The participants were instructed that they were
mate than the quadratic function, hence being easier to learn.            playing a shooting game. In this game, they had to guess
For the case of y = f (t), learning the functional relationship           the position of a target on a horizontal line on the computer
between time point to response magnitude is equivalent to                 screen. On each trial, they moved the mouse cursor to where
learning to predict the next response magnitude with the cur-             they thought the target would appear. After they pressed the
rent observed response magnitude. Thus, it is hypothesized                space key to complete the guessing, the target would appear
that the time-varying function would be easy to learn, if the             as an arrow on the correct position, together with a feedback
correlation between successive stimuli is high. If the corre-             text of ”Hit” or ”Miss” on the screen. The participants were
lation between successive stimuli is low, it would be hard to             told that ”Hit” meant that your guess was close enough to the
learn. To verify this hypothesis, three experiments were con-             true answer and otherwise you would get ”Miss”. The whole
ducted.                                                                   experiment was conducted in two sessions, each of which
                                                                          consisted of 100 trials. The same100 stimuli were presented
                       Experiment 1                                       in the two sessions. The distance between the target’s correct
In this experiment, we first examined whether people can                  position and the participants’ guess was error. The amount
learn a linear time-varying function. The function was written            of squared error and the proportion of received ”Hit” (e.g.,
as xt = t + εt , where t was trial number from 1 to 100 and ε             accuracy) were the dependent variable in this experiment.


                                                                    723
Results                                                                                                   Session 1                                             Session 2
                                                                                          15                                                    15
Visual inspection on Figure 1 shows that participants per-
formed quite well except for the very early trials1 . For sim-                            10                                                    10

plifying the complexity of data analysis, we divided the 100
stimuli to 10 blocks. The squared prediction error decreases                               5                                                     5

from 40.29 to 0.03 with the mean = 4.06 through 10 blocks




                                                                              position




                                                                                                                                    position
across two sessions. A Block (10) × Session (2) within-                                    0                                                     0

subjects ANOVA reveals a significant main effect of Block
                                                                                         −5                                                    −5
on the squared error [F(9, 189) = 72.83, MSe = 98, p < .01],
no significant main effect of Session [F(1, 21) = 2.367, MSe
                                                                                         −10                                                   −10
= 166.30, p = .139], and a significant interaction effect be-
tween Block and Session [F(9, 189) = 2.346, MSe = 166.3,
p < .05].                                                                                      1 7 131925313743495561677379859197
                                                                                                              trial
                                                                                                                                                     1 7 131925313743495561677379859197
                                                                                                                                                                    trial
   The participant’s accuracy is another dependent variable,
which is computed as the number of ”Hit” divided by all
trials. Due to the ”Hit” range was very small in our ex-                     Figure 2: The stimulus structure in Experiment 2 (i.e.,
periments, the highest accuracy in a block was .63 and the                   crosses) and the participants’ predictions (i.e., circles) in Ses-
lowest was .36 across all sessions. A Block (10) × Session                   sion 1 averaged across all participants.
(2) within-subjects ANOVA shows a significant main effect
of Block on the accuracy [F(9, 189) = 8.281, MSe = 0.028,
p < .01], no significant main effect of Session [F(1, 21) < 1],
                                                                             Results
and a significant interaction effect between Block and Session               See the circles and crosses in Figure 2. Apparently, the partic-
[F(9, 189) = 5.052, MSe = 0.027, p < .01].                                   ipants could capture the moving pattern of the target, although
   We also check the correlation between each participant’s                  on the early trials, they made some larger errors. Similar to
predictions and the true answers. The averaged Pearson’s r                   what we found in Experiment 1, the squared prediction er-
across all participants is quite high [r = .97]. Together with               ror drops along blocks from 73.79 to 1.57 (mean = 15.35)
the visual inspection on Figure 1, it is confirmed that people               across two sessions. A Block (10) × Session (2) within-
can learn the linear time-varying function very well.                        subjects ANOVA reveals a significant main effect of Block
                                                                             [F(9, 180) = 14.24, MSe = 1303, p < .01], a significant main
                        Experiment 2                                         effect of Session [F(1, 20) = 17.22, MSe = 196, p < .01],
                                                                             and a significant interaction effect between Block and Ses-
In this√experiment, the function was set up as xt = 50 +                     sion [F(9, 180) = 16.12, MSe = 177.8, p < .01]. Although
(−1)t 100 − t, which made the target jump left and right,                    the error curve goes down toward 0, the mean squared predic-
gradually moving toward the central point. Obviously, this                   tion error is 15.53 far larger than that in Experiment 1, which
function was far more complex than the one used in Exper-                    is 4.06. This suggests that the linear function is easier to learn
iment 1 and it was nonlinear. If the learning of y = f (t)                   than the quadratic function.
shared the same characteristics of the learning of y = f (x),                   The accuracy data also suggest that this function is harder
it should be expected that this function could not be learned                to learn than the linear function with the mean highest ac-
well. However, if our discussion about the characteristics of                curacy in a block across all participants and sessions as .34
time-varying function was right, it should be expected that                  and the lowest as .14. A Block (10) × Session (2) within-
this function could be learned well, due to high correlation                 subjects ANOVA reveals a significant main effect of block
between successive stimuli [r = −.99].                                       [F(9, 180) = 9.747, MSe = 0.018, p < .01], no significant
                                                                             main effect of Session [F(1, 20) < 1], and no significant in-
Method                                                                       teraction effect between Block and Session [F(9, 180) < 1].
                                                                                Although the accuracy is quite low, this does not mean that
Participants and Apparatus There were in total 21 par-                       people cannot learn this function. As shown in Figure 2, the
ticipants recruited from National Chengchi University in Tai-                participants’ predictions are close to the true answers. Also,
wan for this experiment. Each participant was reimbursed by                  the correlation between each participant’s predictions and the
NTD$ 60 (' US$ 2) for their time and traffic expense. The                    true answers is considerably high [mean r = .92]. As ex-
testing materials and procedure are all the same as those in                 pected, the participants can learn this complex time-varying
Experiment 1.                                                                function.
   1 For making the figure easier to read, we plot the human pre-                                                     Experiment 3
diction by circles and the correct answers by crosses on only the
even-numbered trials in the first session. The result pattern is the         In this experiment, we would like to examine whether peo-
same in the second session.                                                  ple could predict the stimulus magnitudes, when the corre-


                                                                       724
lation between successive stimuli was lower. See Figure 3                                                        The squared prediction error drops from 69.69 to 42.47
as an example, which was the real case for testing one par-                                                   along blocks in Session 1 and has no clear change from 23.12
ticipant2 . The dashed line showed the true moving pattern                                                    to 24.30 in Session 2. Although the performance gets better
of the stimulus, which was generated by y = g[a] + z[b + 1],                                                  in Session 2, the prediction error never goes close to 0. The
where a = b((t + 4)/5)c, b = t mod 5, g was the random per-                                                   mean squared error for all participants across blocks and ses-
mutation of the vector [1,6,11,...,96], and for each g, z was a                                               sions is 30.844, which is larger than 15.53 (mean error in Ex-
new random permutation of the vector [1,2,3,4,5]. The cor-                                                    periment 2) and 4.06 (mean error in Experiment 1). Thus, the
relations between successive stimuli were averaged across all                                                 learning performance in this experiment is the worst among
participants and all sessions as r = .80, which was lower than                                                the three experiments in this study.
the correlations in the previous experiments. With no matter                                                     As done for the previous experiments, a Block (10) × Ses-
which view to look at this form (i.e., number of parameters                                                   sion (2) within-subjects ANOVA was conducted for the pre-
to estimate or correlation between successive stimuli), it was                                                diction error. The results show no significant main effect of
expected that this function could not be learned well.                                                        Block [F(9, 153) = 1.53, MSe = 998.4, p = .142], a signif-
                                                                                                              icant main effect of Session [F(1, 17) = 14.94, MSe = 424,
                                                                                                              p < .01], and a significant interaction effect between Block
                                                    Session 1
             15                                                                                               and Session [F(9, 153) = 3.206, MSe = 701.6, p < .01].
                                                                                                                 The mean accuracy in a block across all sessions is even
             10
                                                                                                              lower than that in the other two experiments. The high-
                                                                                                              est mean accuracy is about .11 and the lowest is .06. It
              5
                                                                                                              is clear that the participants cannot capture the moving pat-
                                                                                                              tern of the stimulus. A Block (10) × Session (2) within-
 position




              0
                                                                                                              subject ANOVA shows no main effect of Block on accuracy
            −5
                                                                                                              [F(9, 153) = 1.179, MSe = 0.006, p = .312], no main effect
                                                                                                              of Session [F(1, 17) = 3.367, MSe = 0.006, p = .08], and no
            −10                                                                                               interaction effect between Block and Session [F(9, 153) < 1].
                                                                                                                 We also computed the Person’s r for each participant’s pre-
            −15                                                                                               diction and the true answer. Although the mean correlation is
                  1   7   13   19   25   31   37   43   49      55   61   67   73   79   85   91   97
                                                        trial                                                 not low (r = .76), this finding might result from the fact that
                                                                                                              the participants’ prediction is always one step behind the true
                                                                                                              answer. To sum up, the linear function is the easiest to learn
Figure 3: The stimulus structure in Experiment 3 (i.e.,                                                       and the quadratic function is the second. Basically, partici-
crosses) and predictions of participant #14 (i.e., circles).                                                  pants cannot learn the complex function in Experiment 3. In
                                                                                                              order to get a better understanding about the underly mecha-
                                                                                                              nism for learning the time-varying functions, we developed a
Method                                                                                                        neural network model for the learning of time-varying func-
Participants and Apparatus There were in total 18 partic-                                                     tions.
ipants recruited for this experiment from National Chengchi
University in Taiwan. Each participant was reimbursed by                                                        Model for Learning Time Varying Function
NTD$ 60 (' US$ 2) for their time and traffic expense. The                                                     A time-varying function can be rewritten as xt = f (xt−1 ) and
testing materials and procedure are all the same as those in                                                  the simplest form of it would be xt = β0 + β1 xt−1 . Thus,
Experiment 1.                                                                                                 learning a time-varying function is equivalent to estimating
                                                                                                              the optimal parameter values, with which the model makes
Results                                                                                                       the smallest error. To this end, a simple two-layered neural
As shown in Figure 3, apparently, the participant could not                                                   network is proposed. There are two input nodes, which re-
predict the target position. Otherwise, we will see the dashed                                                spectively correspond to the position of the stimulus on the
line (for answers) and solid line (for participant’s predictions)                                             preceding trial xt−1 and the standard moving distance which
superimpose on each other. However, the response pattern is                                                   is set as 1. There is only one output node corresponding to the
not random either. In fact, the participant’s predictions seem                                                predicted position on the current trail x̂t = w1 × 1 + w2 xt−1 .
always to be one step behind the true answers. Although we                                                    The associative weight w1 represents the size of moving dis-
do not show the predictions of the rest 17 participants, their                                                tance. The weight w2 represents how much correlated the last
predictions are one step behind the true answers also. Thus,                                                  position is with the current position. When the true answer xt
strictly speaking, we do not think that the participants learned                                              is provided, the error is then computed as xt − x̂t .
this function.                                                                                                   The associative weights are updated with WH algorithm3
    2 Different participants received different moving patterns to                                              3 This algorithm is a special case of backpropagation algorithm,
learn.                                                                                                        which is specifically used for two-layered neural network models.


                                                                                                        725
(Abdi, Valentin, & Edelman, 1999) to decrease the error                  by moving it a certain distance (i.e., 0.30 times of the stan-
made by the model. Also, we make the updating amount for                 dard moving size) from the place a bit behind (i.e., 70%) the
weights decay all the way through training trials. Thus, the             position just seen in the same direction of the last move.
updated amount for w1 on trial t is ∆w1,t = ηexp−ξ(t−1) (xt −               For Experiment 2, the mean learning rate is high and so is
x̂t ), where η ≥ 0 is the learning rate and ξ ≥ 0 determines             the mean decay rate. This suggests that the model adjusts the
how quickly the updated amount of weight drops. Likewise,                associative weights largely on the early learning trials, but
∆w2,t = ηexp−ξ(t−1) (xt − x̂t )xt−1 .                                    quickly halts doing so. The learned associative weights are
     There are some features of this model worth noting. First,          w1 = 1.00 and w2 = −0.94. The negative weighting for the
the associative weight w2 actually reflects the correlation be-          preceding position enables the model to make symmetrical
tween successive stimuli. Second, this model only learns the             predictions between successive trials and |w2 | ≤ 1 enables the
correlation between successive stimuli and contains no sum-              model to gradually converge the predicted position toward the
mary information of the whole function. In fact, it can be               midpoint.
applied to account for the learning of different time-varying               For Experiment 3, the mean estimated learning rate is low
functions, as no matter which form (complex or simple) the               and the decay rate is high, suggesting that the model has
function has, the learning of a time-varying function can al-            not updated the associative weights too much since early tri-
ways be viewed as the learning of the association between                als. In fact, the learned associative weights, w1 = 0.01 and
successive stimuli. Thus, our model should be regarded as an             w2 = 0.98, together suggest that the model merely repeats the
associative-based model, not a rule-based model.                         preceding target position as the current prediction. As the
                                                                         model captures the participants’ response patterns very well,
                           Modeling                                      it is implied that the participants did not actually learn the
The model was fit to each participant’s data in each experi-             function but just repeated what they saw as the prediction for
ment with the stimulus positions being normalized between 0              the next trial.
and 1. Each participant’s first response in each session was by             It is revealed in Experiment 2 that the larger η or ξ is, the
default the first input for the model. The initial weights of w1         smaller the error is (r = −.51, p < .05 for η and r = −.57, p <
and w2 were set as 0 for all experiments except Experiment 3.            .01 for ξ) but no significant correlations between parameters
The model provided the best fit for Experiment 3 data when               and human performance in other experiments. This might be
w2 was initially set as 1, suggesting that participants in Ex-           because that Experiment 1 and Experiment 3 are either too
periment 3 were more likely to repeat the observed position              easy or too hard for the participants to learn.
of stimulus on the preceding trail as the response for current
trail. The statistics of optimally estimated parameter values                                                      Exp 1
and the goodness of fit (RMSD) for all experiments are listed
in Table 1.                                                                             10
                                                                            Position




Table 1: Mean goodness of fit and mean estimated parame-                                 0                                                        Human

ter values for a best fit with the standard deviation listed in                                                                                   Model

parenthesis.
                                                                                       −10
                   RMSD               η              ξ
      Exp 1      0.04 (0.02)     1.06 (0.71)    0.02 (0.09)
                                                                                             2 6 1014182226303438424650545862667074788286909498
      Exp 2      0.08 (0.03)     1.73 (1.14)    0.30 (0.55)                                                         Trial

      Exp 3      0.09 (0.03)     0.43 (0.55)    1.81 (4.14)
                                                                         Figure 4: The model prediction and averaged human response
   The smaller the RMSD, the better the fit is. Apparently, the          in Session 1 in Experiment 1.
model fit all the data very well. See the crosses in Figure 4,
Figure 5, and Figure 6 for the model prediction in Session 14 ,
which are quite close to the circles denoting the participants’
                                                                                                          General Discussion
responses.                                                               The main purpose of this study is to examine the characteris-
   The estimated learning rate for Experiment 1 is about 1 and           tics of function learning with time-varying functions. Three
the decay rate is quite small, suggesting that decay of learning         experiments were conducted with different time-varying
is not fast and leaning continues through training trials. The           functions: linear, quadratic, and irregular. The differences
learned associative weights for the moving size w1 = 0.30 and            between these functions are not only the complexity of the
the correlation with the preceding stimulus w2 = 0.70 suggest            function form, but also the strength of correlation between
that the participants predict the current position of the target         successive stimuli. In the first two experiments, the correla-
                                                                         tion is very high regardless of the direction, whereas in the
   4 The pattern is almost the same for Session 2.                       third experiment, the correlation is lower.


                                                                   726
   The behavioral data show that the learning of the linear                                                                               Exp 3
and quadratic functions are easier than that of the irregular
function, suggesting that the correlation between successive                                                 10

stimuli is critical to function learning with time-varying func-
                                                                                                              5
tions, not the number of parameters (or the complexity) of the




                                                                                                 Position
                                                                                                                                                                              Human
function. The success of our model supports the associative-                                                  0
                                                                                                                                                                              Model
based account and implies that a time-varying function can                                                  −5
be learned as a composition of many partial representations,
not a holistic representation.                                                                              −10

   One may regard the learning of time-varying functions as                                                 −15
operant conditioning. That may or may not be true, de-                                                            1   10   19   28   37   46   55   64   73   82   91   100
                                                                                                                                           Trial
pending on what we think is actually conditioned. If the
response is the target for conditioning, then the learning of
time-varying functions is not operant conditioning, as every                                  Figure 6: The model prediction and human response of par-
single response is new and it is impossible to reinforce the                                  ticipants #14 in Session 1 in Experiment 3.
likelihood for the same response to be made in the future.
However, if the moving size is the target for conditioning,
then for the case in which the target moves constantly (e.g.,                                 DeLosh, E. L., Busemeyer, J. R., & McDaniel, M. A. (1997).
the linear function in Experiment 1), we may regard the learn-                                  Extrapolation: The sine qua non for abstraction in function
ing of the time-varying function as a kind of operant condi-                                    learning. Jounral of Experimental Psychology: Learning,
tioning. However, for the case where the target moves in a de-                                  Memory, and Cognition, 23, 968-986.
creasing (or increasing) speed (e.g., the quadratic function in                               Kalish, M. (2013). Learning and extraploating a periodic
Experiment 2), it might not be suitable to equate the learning                                  function. Memory & Cognition, 41, 886-896.
of time-varying functions and operant conditioning. Future                                    Kalish, M., Lewandowsky, S., & Kruschke, J. K. (2004).
studies including the transfer trials are needed in order to ex-                                Population of linear experts: Knowledge partitioning and
amine whether people form any concept for the time-varying                                      funciton leanring. Psychological Review, 111, 1072-1099.
function.                                                                                     Koh, K., & Meyer, D. E. (1991). Function learning: Induc-
                                                                                                tion of continuous stimulus-response relations. Journal of
                                             Exp 2                                              Experimental Psychology: Learning, Memory, and Cogni-
                                                                                                tion, 17, 811-836.
               10
                                                                                              Lewandowsky, S., Kalish, M., & Ngang, S. K. (2002). Sim-
                                                                                                plified learning in complex situations: Knowledge parti-
                                                                                                tioning in function learning. Journal of Experimental Psy-
   Position




                                                                                Human
                0                                                                               chology: General, 131, 163-193.
                                                                                Model
                                                                                              McDaniel, M. A., & Busemeyer, J. R. (2005). The conceptual
                                                                                                basis of function leanring and extraploation: Comparison
              −10                                                                               of rule-based and associative-based models. Psychonomic
                                                                                                Bulletin & Review, 12, 24-42.
                    1   10   19   28   37   46   55
                                             Trial
                                                      64   73   82   91   100                 Navarro, D. J., & Perfors, A. (2009). Learning time-varying
                                                                                                categories. In Proceedings of the 31st annual conference of
                                                                                                cognitive science society (p. 414-424). austin, tx: Cognitive
Figure 5: The model prediction and averaged human response                                      science society.
in Session 1 in Experiment 2.                                                                 Navarro, D. J., & Perfors, A. (2012). Anticipating changes:
                                                                                                Adaption and extrapolation in category learning. In
                                                                                                N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Building
                                            References                                          bridges across cognitive sciences around the world: Pro-
Abdi, H., Valentin, D., & Edelman, B. (1999). Neural net-                                       ceedings of the 34th annual conference of the cognitive sci-
  works. SAGE Publications, Inc.                                                                ence society (p. 809-814). Austin, TX: Cognitive Science
Busemeyer, J. R., Byun, E., Delosh, E., & McDaniel, M. A.                                       Society.
  (1997). Learning functional relations based on experience                                   Navarro, D. J., Perfors, A., & Vong, W. K. (2013). Learning
  with input-output pairs by humans and artificial neural net-                                  time-varying categories. Memory and Cognition, 41, 917-
  works (K. Lamberts & D. R. Shanks, Eds.). Cambridge,                                          927.
  MA, US: The MIT Press.                                                                      Peirce, J. W. (2007). Psychopy - psychophysics software in
Carroll, J. D. (1963). Function learning: The learnig of                                        python. Journal of Neuroscience Methods, 162, 8-13.
  continuous functional maps relating stimulus and response
  coninua. Princetron, NJ: Educational Testing Service.


                                                                                        727