=Paper=
{{Paper
|id=Vol-3051/PA_4
|storemode=property
|title=Understanding Students’ Problem-Solving Processes via Action Sequence Analyses (Short Paper)
|pdfUrl=https://ceur-ws.org/Vol-3051/PA_4.pdf
|volume=Vol-3051
|authors=Ruhan Circi,Manqian Liao,Chad Scott,Juanita Hicks
|dblpUrl=https://dblp.org/rec/conf/edm/CirciLSH21
}}
==Understanding Students’ Problem-Solving Processes via Action Sequence Analyses (Short Paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-3051/PA_4.pdf</pdf>
<pre>
  Understanding Students’ Problem-Solving Processes via
               Action Sequence Analyses
              Ruhan Circi                      Manqian Liao            Chad Scott                         Juanita Hicks
 American Institutes for Research                  Duolingo               Deloitte              American Institutes for Research

Research in this paper was developed and conducted during the 2019 NAEP doctoral internship program administered by AIR and funded
by NCES under Contract No. ED-IES-12-D-0002/0004. The views, thoughts, and opinions expressed in the paper belong solely to the
authors and do not reflect NCES position or endorsement.


ABSTRACT                                                               It has become commonplace to include response time (RT) in
The transition of the National Assessment of Educational Progress      addition to responses in the psychometric models to account for
(NAEP) to digitally based assessments (DBAs) allowed for the           speed and accuracy (e.g., Goldhammer, 2015), and to examine the
collection of data that can provide insights into students’ problem-   relationship between response time and item- and person-level
solving processes. When students interact with a NAEP DBA              factors (e.g., Masters, Schnipke, & Connor, 2005). Response time
item, their recorded timestamped events in the process data form       is used to examine psychometric quality of items and students’
sequences. We refer to action sequences as the series of clicks or     test-taking behaviors and it is concluded to be promising for
other actions a student makes within an item. Using data from one      various assessment elements. Yet, the process data contains richer
released block of the NAEP 2017 mathematics assessment for             information such as actions that students use during their
grade 4, this study aims to provide an understanding of the            problem-solving processes and the allocation of the time students
relationships among action sequence characteristics, item              spend on particular activities within a single response time
characteristics and student performance.                               remained unexplored.
                                                                       When students interact with a NAEP DBA item, their recorded
First, we extract individual actions sequences across items.           timestamped events in the process data form sequences. These
Second, we categorize each individual action into one of four          sequences contain information about the order, mobility, and
activities: Browsing, Passive investigation, Active investigation,     duration of the tasks students take throughout the problem-solving
or Decision. This categorization enables us to investigate             process and may shed light on the processes underlying the
sequence patterns within and across different items. Sequence          students’ test-taking behaviors. In this study, we divide response
characteristics are summarized from two perspectives: a) the time      times into subcategories using the action definitions to provide a
spent on each activity is calculated for each student across items     more meaningful understanding of student test taking behavior
and b) the within-sequence entropy (Shannon, 1948) and                 and examine the differences across item types and student
turbulence (Elzinga, 2006) of the sequences are calculated to          performance.
quantify students’ action mobility.
                                                                       1.1 Literature
We found that the time students spend on “Decision” and “Passive       Process data is most commonly used to calculate response time
investigation” activities can be used to predict student               (RT), defined as the time an examinee takes to complete an item
performance.                                                           or assessment. Due to the association of RT with psychological
                                                                       and cognitive processes (e.g., Huff & Sireci, 2001), RT is often
Keywords                                                               used to make decisions such as setting assessment time limits
NAEP, Process data, Digitally based assessments, sequence              (e.g., van der Linden, 2011) and capturing aberrant test-taking
mining, action sequences.                                              behaviors (e.g., Marianti, Fox, Avetisyan, Veldkamp, & Tijmstra,
                                                                       2014).
                                                                       However, RT alone may not provide sufficient information to
1. BACKGROUND                                                          draw inferences about the processes underlying students’ test
In 2017 the National Assessment of Educational Progress (NAEP)         taking behaviors (Lee & Haberman, 2016). In fact, RT could
transitioned from paper-based assessments (PBAs) to digitally          consist of the time for various components in the problem-solving
based assessments (DBAs). DBAs allow us to capture student             process such as preparation (e.g., forming a response plan) and
interactions with the test screen that are recorded as timestamped     writing down/typing the response. The decomposition of RT can
events. These records form data known as process data.                 differ depending on item types (e.g., Li, Banerjee, & Zumbo,
                                                                       2017). Thus, to ensure the validity of inferences drawn from RTs,
                                                                       it is necessary to understand what students actually do throughout
                                                                       the RT.

                                                                       2. CURRENT STUDY
                                                                       In the assessment setting, RT could consist of the times for
                                                                       various components in the problem-solving process such as
Copyright © 2021 for this paper by its authors. Use permitted
                                                                       preparation (e.g., forming a response plan) and writing
under Creative Commons License Attribution 4.0 International
                                                                       down/typing the response. The decomposition of RT can be
(CC BY 4.0)
                                                                       different for different item types (e.g., Li, Banerjee, & Zumbo,


                                                                                                                                       1
 2017). Since the NAEP mathematics assessment consists of items          and turbulence, are used as features to predict students’ item
 with a mix of item types (e.g., multiple choice, constructed            scores. The results could inform which feature(s) of the sequences
 response), using the decomposition of RT for different tasks (e.g.,     best contribute to correct/incorrect item responses or the
 investigation, decision) rather than total RT could be helpful when     presence/absence of the responses. Moreover, the score
 different decisions (e.g., setting assessment time limits, capturing    distributions are compared across sequence clusters.
 aberrant test-taking behaviors) are to be made based on the time.
                                                                         In sum, this study, by decomposing RT and examining the
 A more fine-grained understanding of the relationships among            relationships among the sequence characteristics, item
 RTs and students’ problem-solving behaviors can be gained by            characteristics and student performance, aims to inform more
 analyzing students’ action sequences, which can further improve         meaningful ways of calculating RT (e.g., different ways of RT
 the usefulness of RT in psychometric research (e.g., determining        calculation for different items) and the validity of score categories
 non-response categories such as omit and not reach).                    such as “omit” and “not reach”. For instance, if the sequences of
                                                                         students who were scored as “not reach” were found to contain
 The goals of this study are: a) identify and describe the action        some actions that are related to making responses (i.e., the
 sequences of students in a meaningful way, b) examine mobility          “Decision” actions), the scores of these students may be
 across actions, c) differentiate profiles of action sequences, and d)   considered as “omit” as opposed to “not reach”.
 explore students’ performance in connection to sequence clusters.
 Steps taken for current project can be presented as follows: First,     2.1 Research Questions
 individual actions are extracted. Second, students’ response            Specifically, the following research questions are examined in the
 processes are represented as sequences consisting of four tasks,        current study:
 i.e., Browsing, Passive investigation, Active investigation, and
                                                                         RQ1. What actions do students take and what are the
 Decision (See definitions in Table 1). Since the variation across
                                                                         characteristics of the action sequences (mobility, time
 time for individual actions can be very large, we decided to use a
                                                                         distribution) throughout the RTs of the NEAP math items?
 set cut off point (2 seconds) for defining each action. In the end,
 we recoded the sequence of student actions in these groups for          RQ2. How do students’ action sequences differ across different
 further analyses (See Figure 1 for an example). Then, the               item types (e.g., multiple-choice item, constructed-response
 characteristics of the sequences are summarized from two                item)?
 perspectives: a) the time spent on each task is calculated for each
 student, which allows the decomposition of the RT, and b) the           RQ3. Which action sequence characteristic(s) best predict the
 within-sequence entropy (Shannon, 1948) and turbulence                  item scores?
 (Elzinga, 2006) of the sequences are calculated to quantify
 students’ action mobility.                                              3. DATA
                                                                         We used data from one of the released blocks from NAEP 2017
 Table 1. Definition and Example Action of Each Behavior                 Grade 4 Mathematics assessment. One of the released blocks
 Category                                                                includes 29,100 4th graders in both public and private schools and
 Behavior                                                                consists of 14 cognitive items. The sample was collected using the
                          Definition               Example Action        conventional NAEP sampling procedures, i.e., a two-stage
 category
                Examinees browse the content          Horizontal         stratified random sampling design with schools selected in the
 Browsing       of an item by executing scroll     scrolling, vertical   first stage and students in the second stage. In the data cleaning
                on the screen                          scrolling         procedure, students with accommodation or interruptions were
                Examinees get support from                               excluded. Comparisons of the demographic composition of the
                                                     Change theme
   Passive      assistive tool for their                                 two samples, full sample and analytical sample, are presented in
                                                   (change the color
investigation   problem-solving process                                  Table 2.
                                                    of background)
                without interacting item
                Examinees interact with item          Draw with          Table 2. Summary Statistics for Full and Analytical Sample:
   Active                                                                Student Demographic Characteristics
                as a part of their problem-          scratchwork,
investigation
                solving process                        highlight                                   Weighted                  Unweighted
                Examinees make responses to        Click choice, text                        Analytical       Full       Analytical    Full
  Decision
                an item                                  enter
                                                                         Observations         649,500       780,500        24,100     29,100
                                                                         Gender                                  Percentages
 In addition to summarizing sequence characteristics in a                     Female             50            49            50           49
 descriptive manner, this study examines the relationships among         Race/Ethnicity                          Percentages
 the sequence characteristics, item characteristics and students’              White             51            49            52           50
 item responses. Specifically, to examine the relationship between             Black             14            15            17           18
 sequence characteristics and item characteristics, the RT                   Hispanic            24            26            20           22
                                                                               Asian              6             5             4            4
 decomposition and students’ action mobility are compared across
                                                                         American Indian          1             1             2            2
 different items. Furthermore, representative sequence(s) are
 identified for each item with the use of a sequence dissimilarity             Other              4             4             5            5
                                                                         National School
 measure and a clustering algorithm. The representative                  Lunch Program*
                                                                                                               Percentages
 sequence(s) can inform the typical response process of an item.              Eligible           48           50             51           54
 Finally, to examine the relationship between sequence                     Not Eligible          46           44             45           43
 characteristics and student performance, sequence characteristics,      * No Information categories are not presented.
 such as the time duration of each task, within-sequence entropy         Note: Because all extended time accommodation students (that are
                                                                         excluded from analyses) are either with limited English

                                                                                                                                               2
proficiency or in individualized education program, the results for     Figure 1. The procedure of turning raw process data into an
these variables are not included. Detail may not sum to totals                               action sequence.
because of rounding.
                                                                       The mean time spent on each action as well as the action mobility
A small non-significant difference in the proportion of White          were summarized as the sequence characteristics. The number of
(50.5 % in analytical, and 48.9% in full sample) and Hispanic          task transitions, Shannon entropy (Shannon, 1948) and turbulence
students (24.4% in analytical and 25.9% in full sample) are            (Elzinga, 2006) measures were used to quantify the action
observed. A significant difference in term of NSLP non-eligible        mobility.
category is found (46% vs. 43.8%).                                     To examine how students’ action sequences differ across different
                                                                       item types (e.g., multiple-choice item, constructed-response item),
4. ANALYSIS                                                            the characteristics of sequences were summarized and compared
To construct sequences and decompose RT from the process data,         across different items. To identify the typical response process for
we followed two steps (See Figure 1 for a demonstration of the         an item, the hierarchical agglomerative clustering algorithm was
procedure): a) Recoding the actions into four task categories (i.e.,   applied to all the students’ sequences based on the optimal
Browsing, Passive investigation, Active investigation, Decision;       matching edit distance (Levenshtein, 1966) matrix. The medoids
See definitions in Table 1); and b) Calculating the time duration      of the clusters (i.e., the sequence that is the nearest to the virtual
of each task. Thus, students’ item response processes were             center of the cluster) were treated as the representative sequences
represented as sequences whose lengths are proportional to the         that represent the typical response processes for an item. As no
time durations. Since the variation of time students spend on an       study to our knowledge has been done to determine the optimal
item can be large (i.e., range from 0.01 second to 30 minutes),        number of clusters when the clustering is based on the edit
using a small-time unit (e.g., 0.01 second) could result in            distance matrix. Ward’s algorithm was used to form clusters by
extremely long sequences that exceed the computer memory               maximizing within cluster homogeneity. We chose the number of
capacity. Therefore, we decided to use 2 seconds as the time unit      clusters by visually inspecting the dendrogram and assessing the
while constructing the sequences. Only actions in students’ initial    interpretability of the clusters. Specifically, for each item, we
item visit (i.e., actions between the first pair of “Enter Item” and   examined the cluster medoids when the number of clusters ranged
“Exit Item” actions) were included in the sequence. Students           from 2 to 4 and chose the number of clusters that resulted in
whose initial item visit lasts longer than 8 minutes (480 seconds)     interpretable clusters from practical perspectives. All sequence
were excluded from the analyses to avoid extremely long                analyses were performed using the TraMineR R package.
sequences. For all the items in the MA block, the percentages of
students with initial item visit longer than 8 minutes are lower       To examine the relationship between the sequence characteristics
than 1%.                                                               and student performance, the sequence characteristics were used
                                                                       as features to predict the item scores using the regression tree
                                                                       (Breiman, 2017). In addition, the score distributions were
                                                                       compared across the sequence clusters identified based on the
                                                                       hierarchical clustering algorithm and edit distance.

                                                                       5. RESULTS
                                                                       For the purposes of this paper, we present the results for two
                                                                       selected items1 listed in Table 3. The items are different item
                                                                       types (Item A is multiple-choice item while Item B is constructed
                                                                       response item) and are close in the presentation order. Thus, the
                                                                       two items were chosen to demonstrate the difference in sequence
                                                                       characteristics between items of different types (with minimal
                                                                       confounding of the presentation order).
                                                                       Table 3. Characteristics of the Two Example Items
                                                                       Item Characteristics                        Item Label
                                                                                                         Item A                   Item B
                                                                             Item type             Multiple-Choice             Fill In Blank

                                                                         Presentation order                 2                       4
                                                                           Item difficulty
                                                                                                          -0.17                    0.29
                                                                              parameter
                                                                                                   Compare heights             Divide 3-digit
                                                                            Item content
                                                                                                    of objects in a         whole number by 1-
                                                                             description
                                                                                                        figure              digit whole number


                                                                       5.1 Response Time Decomposition
                                                                       The average time students spent on each recoded behavior actions,
                                                                       i.e., browsing, passive investigation, active investigation, and

                                                                       1 https://nces.ed.gov/NationsReportCard/nqt/Search


                                                                                                                                               3
decision are shown in Figure 2. For Item A, the “decision” task       Table 4 lists the summary statistics of three mobility measures,
had the highest average time among the four tasks; however, for       i.e., the number of task transitions, within-sequence entropy, and
Item B, the “passive investigation” task had the highest time. On     turbulence. Task transition refers to switching among the four
average, students spent 10 seconds browsing Item A by executing       tasks (i.e., browsing, passive investigation, active investigation,
scroll on the screen, while students hardly spent any time            and decision) in the sequence. The average task transitions for
browsing Item B. Such difference in the browsing time could be        item A and B are 2.28 and 2.13, respectively. As for the within-
associated with the content of the items: Item A needs to be          sequence entropy and turbulence measures, higher values indicate
solved by inspecting and comparing the heights of the trees which     larger mobility. On average, item A is found to have higher
may result in browsing actions, while Item B is a straightforward     within-sequence entropy and turbulence.
computational item which may not require much browsing.               Table 4. Mobility Measures of the Two Selected Items
                                                                         Mobility Measure                       Item A         Item B
                                                                                                   Min             1              1
                                                                           Number of task         Median           2              2
                                                                             transitions          Mean            2.28           2.13
                                                                                                   Max             4              4
                                                                                                   Min             0              0
                                                                          Within-Sequence         Median          0.38           0.29
                                                                              Entropy             Mean            0.37           0.30
                                                                                                   Max             1             0.92
                                                                                                   Min             1              1
                                                                                                  Median          3.18           2.76
  Figure 2. Average time students spent on recoded actions                   Turbulence
                                                                                                  Mean            3.36           3.28
          when interacting with two selected items.                                                Max           11.24          14.43
5.2 Sequence Characteristics                                          5.3 Typical Response Process
Figure 3 presents the state distributions at each time unit for the   Figure 4 shows the representative sequences for Item A and Item
two selected items. Each unit of the x-axis represents 2 seconds.     B. A representative sequence refers to the sequence with the
For instance, in the first 2 seconds, students who conducted          smallest sum of edit distance to the rest of sequences; the
“passive investigation” make up the largest proportion in both        representative sequence is considered to be representative of the
items. When responding to Item A, more than 10% of the students       typical response process of an item. As the sequence length is
were browsing the item in the first 2 seconds; when interacting       proportional to the time duration, the overall time duration of the
with Item B, nearly no students browsed the item in this time unit.   typical response process is shorter for Item A than Item B. We
                                                                      observe that, in the typical response process;
                                                                           • for Item A, the student conducts passive investigation,
                                                                               browses the item, and makes response decisions,
                                                                               sequentially.
                                                                           • for Item B, the student conducts passive and active
                                                                               investigations and makes response decisions.


  Figure 3. State distribution plot of the two selected items.


                                                                       Figure 4. Representative sequences of the two selected items.

                                                                                                                                        4
While identifying a single typical response process for an item is     5.4 Relationship Between the Sequence
desirable for the purpose of interpretation, a single sequence may
not be enough to represent all the sequences. It is possible that      Characteristics and Student Performance
there are multiple response process archetypes for an item. Thus,      Figure 6 shows the regression tree learned from the process data
we conducted hierarchical agglomerative clustering based on the        of Item B. Time durations of browsing, passive investigation,
edit distance matrix. In the clustering process, each unit is a        active investigation and decision, number of task transitions,
student. After examining the dendrogram and the interpretability       within-sequence entropy and turbulence are used to predict item
of the clusters, we chose to retain three clusters (labeled as Type    scores. Item B has five score categories, i.e., incorrect, correct, off
1, Type 2, and Type 3). The weighted cluster sizes and students’       task, omitted, and not reached. Each box in Figure 6 is called a
demographic characteristics by sequence clusters found in Item B       “node” and the five decimals in each box are the predicted
are presented in Table 5.                                              proportions of students having the five score categories in that
                                                                       node. The name (and color) of the node is determined by the score
Table 5. Student demographic characteristics by Sequence               category that has the highest proportion among the five categories.
Clusters in Item B                                                     For example, as the first split was performed with the decision
                               Weighted Percentages                    time, for students with decision time longer than 14 seconds (25%
                         Type 1      Type 2       Type 3               of the students in the sample have decision time longer than 14
Observations             307,100    172,500      157,400               seconds), the predicted proportions of getting “incorrect” and
Gender                             Percentages                         “correct” scores are 0.70 and 0.28, respectively. In addition, all
                                                                       the splits in this regression tree are performed with either decision
       Female              45          52           57
                                                                       or passive investigation time durations.
Race/Ethnicity                     Percentages
        White              49          53           51
        Black              15          13           13
      Hispanic             24          23           26
        Asian               6           6           4
 American Indian            1           1           1
        Other               4           5           4
National School
                                      Percentages
Lunch Program*
       Eligible             48            45              49
    Not Eligible            45            49              45
Detail may not sum to totals because of rounding.
Figure 5 displays the representative sequences of the three clusters
in Item B, which represent three archetypes of response processes
in this item. The representative sequences of Type 1 and Type 3
only consist of “passive investigation” and “decision”. The time
duration of “passive investigation” is longer for the representative
sequence in Type 3 than Type 1. The representative sequence of
Type 2 consists of “active investigation” in addition to “passive        Figure 6. Regression tree learned from the process data of
investigation” and “decision”.                                                                    Item B.

                                                                       5.5 Relationship between the Sequence
                                                                       Cluster and Student Performance
                                                                       Table 6 lists the score distribution of scores within each sequence
                                                                       cluster found in Item B. Item B is a fill-in-blank item, which is the
                                                                       fourth item in the block with a difficulty level of 0.29. In all three
                                                                       clusters, the proportion of students getting “incorrect” score was
                                                                       the highest among the five score categories. The similarity in the
                                                                       score distributions across sequence clusters implies that no clear
                                                                       pattern on the performance difference has been found among
                                                                       students with different response process archetypes.
                                                                       Table 6. Score Distribution of Each Sequence Cluster in Item
                                                                       B.
                                                                                                                 Percentages (%)
                                                                       Cluster    Cluster     Correct     Incorrect     Omitted     Not          Off
                                                                                  Size                                              reached      task
                                                                       Type 1     11,500      42.2        54.0          3.4         0.2          0.2
   Figure 5. Representative sequences of the three sequence
                  clusters found in Item B.                            Type 2     6,100       42.9        53.5          3.3         0.2          0.1
                                                                       Type 3     5,900       42.5        53.7          3.5         0.2          0.2
                                                                       Note. Percentages of each row add up to 100%.

                                                                                                                                            5
6. DISCUSSION                                                          features such as the frequencies of subsequences (e.g., the
                                                                       frequency of a student switching from passive investigation to
6.1 Summary                                                            active investigation and then to decision), together with feature
In summary, this study provided insights into the decomposition        selection algorithms, could be incorporated in future studies.
of RT by constructing action sequences from students’ process
data. In particular, the action sequences contained information of     7. REFERENCES
the time duration, order and mobility of the tasks students
executed to solve the NAEP mathematics items. By presenting the
sequences of two selected NAEP released items as examples, this        [1] Breiman, L. (2017). Classification and regression trees.
paper demonstrated the differences in RT decomposition and                 Routledge.
typical response processes between items of different types (i.e., a   [2] Elzinga, C. H. (2006). Turbulence in categorical time series.
multiple-choice item vs a fill-in-blank item). This methodology            Mathematical Population Studies.
and set of results suggest that examining action sequences and RT
decomposition can be a useful way to mine process data and             [3] Goldhammer, F. (2015). Measuring ability, speed, or both?
uncover educational processes. Also, action sequence mining can            Challenges, psychometric solutions, and what can be gained
be useful to analyze high variance data such as process data.              from experimental control. Measurement: Interdisciplinary
                                                                           Research and Perspectives, 13(3–4), 133–164.
Response process archetypes were found by conducting a
hierarchical clustering algorithm using the edit distance matrix of    [4] Huff, K. L., & Sireci, S. G. (2001). Validity issues in
students’ action sequences. As for the relationship between                computer-based testing. Educational Measurement: Issues
student performance and sequence characteristics, the time                 and Practice, 20(3), 16–25.
students spent on “Decision” and “Passive investigation” were          [5] Lee, Y.-H., & Haberman, S. J. (2016). Investigating test-
incorporated in the learned regression tree of the example fill-in-        taking behaviors using timing and process data. International
blank item, meaning that these components of RT can be used to             Journal of Testing, 16(3), 240–267.
predict the scores of this item. Further, among the 10,000 students    [6] Levenshtein, V. I. (1966). Binary codes capable of correcting
who correctly responded to Item B, 48.6% had their action                  deletions, insertions, and reversals. Soviet Physics Doklady,
sequences clustered into Type 1, 26.3% into Type 2 and 25.2%               10(8), 707–710.
into Type 3, which implied that students who responded to the
item correctly may have different response processes.                  [7] Li, Z., Banerjee, J., & Zumbo, B. D. (2017). Response time
                                                                           data as validity evidence: Has it lived up to its promise and,
6.2 Limitations and Future Research                                        if not, what would it take to do so. In B.D. Zumbo & A.M.
As an initial exploration of action sequences in the NAEP                  Hubley (Eds.), Understanding and Investigating Response
mathematics items, this study has limitations and opens up                 Processes in Validation Research (pp. 159–177). Springer.
opportunities for future research. First, the actions were             [8] Marianti, S., Fox, J.-P., Avetisyan, M., Veldkamp, B. P., &
categorized into four tasks (browsing, passive investigation, active       Tijmstra, J. (2014). Testing for aberrant behavior in response
investigation and decision) in this study. However, this may not           time modeling. Journal of Educational and Behavioral
be the only way to categorize the actions. For instance, in a              Statistics, 39(6), 426–451.
multiple-choice item, the actions could be recoded based on
students’ selected options. Thus, sequences that reflect students’     [9] Masters, J., Schnipke, D. L., & Connor, C. (2005).
trajectory of answer changes can be constructed.                           Comparing item response times and difficulty for calculation
                                                                           items. In annual meeting of the American Educational
Second, the number of clusters was determined only based on the            Research Association, Montreal, Canada.
dendrogram and the interpretability of the clusters in this study.
To better justify the choice of the number of clusters, future         [10] Shannon, C. E. (1948). A mathematical theory of
studies could develop quantitative measures to determine the                communication. Bell System Technical Journal, 27(3), 379–
optimal number of clusters based on the edit distance matrix.               423.

Finally, this study only included a limited number of sequence         [11] van der Linden, W. J. (2011). Setting time limits on tests.
characteristics as features to learn the regression tree. Other             Applied Psychological Measurement, 35(3), 183–199.


                                                                                                                                            6

</pre>