<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performance Prediction of Elementary School Students in Search Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto González-Ibañez</string-name>
          <email>roberto.gonzalez.i@usach.cl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luz Chourio-Acevedo</string-name>
          <email>luz.chourio@usach.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro Nacional de Desarrollo e Investigación en Tecnologías Libres, Avenida Humberto Carnevalli</institution>
          ,
          <addr-line>Edificio CENDITEL, Mérida</addr-line>
          ,
          <country country="VE">Venezuela</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Santiago de Chile, Avenida Libertador Bernardo O'Higgins no 3363. Estación Central</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last two decades, the use of online resources in educational settings has seen an unprecedented growth. Regrettably, students' online inquiry competences (OIC) are not necessarily well developed to face problems involving information intensive domains. While diferent OIC development approaches have been proposed to address this situation, these fail in timely identifying their efects on students' OIC applied to practical search scenarios. To address this drawback, in this article we study models to predict students' search performance in the context of an OIC evaluation test. Our approach focuses on exploiting demographic, behavioral, cognitive, and afective features, to predict - at four points of the overall search process - whether students succeed or fail in finding relevant documents to accomplish a research task. Our preliminary results show that it is possible to anticipate the overall search performance of students with moderate accuracy at the 25%, 50%, 75%, and 90% of the search session progress. These findings illustrate potential benefits and limitations of using non-obstrusive aggregated signals to timely predict search performance in learning contexts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Search perfomance</kwd>
        <kwd>prediction</kwd>
        <kwd>classification</kwd>
        <kwd>elementary school</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>María Escobar-Macayaa</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        contexts, prediction focuses on forecasting performance
by estimating unknown values of variables that
charInternet, and particularly the World Wide Web (WWW), acterize students. Such values typically relate to
perhas become the main resource for students who look formance, knowledge, and scores. Prediction can be
for information to complete their school assignments. also used to: identify learning styles, determine whether
Although abundant, not all the content on the Web is a student will answer a question correctly, model
knowlcurated[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This poses a major problem for students edge changes, and determine non-observable learning
who may not be well equipped in terms of OIC. In- variables [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
deed, knowing what information is needed and how In this article, we explore the possibility to
anticito search for it (i.e., some component skills of OIC) is pate student’s search performance by exploiting a set
crucial to succeed in online research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To tackle this of demographic, behavioral, cognitive, and afective
problem, diferent approaches to help students in the features through machine learning. The remaining
secdevelopment of OIC have been proposed [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. A fun- tions of this article are organized as follows. First, we
damental limitation of these approaches is their inabil- describe the methodological approach adopted for this
ity to timely determine whether students will succeed work. Second, we present preliminary results. Finally,
or fail when engaging in actual search tasks. we conclude with a discussion of the results, their
im
      </p>
      <p>In the context of OIC development, knowing in ad- plications, and future work.
vance how a student will perform in a search task could
be particularly useful to both educators and students.</p>
      <p>First, educators could ofer opportune feedback and 2. Method
support to their students, thus avoiding late
evaluations typically available only after tests are completed. 2.1. Dataset
Second, students themselves could be more aware of
their own performance, which could help them to
correct themselves or look for support. In educational</p>
      <p>
        To conduct this study, we relied on a subset of the
data collected as part of the iFuCo project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our
sample contains search sessions from 350 Finnish
students performing two independent research tasks, this
in the context of an evaluation of OIC. A summary of
demographic data of the students whose records are
included in our study is presented in Table1.
      </p>
      <p>
        Records in this dataset were captured through
NEURONE (oNlinE inquiry expeRimentatiON systEm) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
This system ofered a realistic simulation of a search
engine operating on a controlled collection of web
documents for each research task. The document
collection was developed by the research team and
comprised 20 web pages per tasks, three of them defined as
relevant. Regarding the latter documents, these were
created by researchers and all three were required to
be found in order to accomplish each research task.
      </p>
      <p>The dataset contains various types of data, which
includes behavioral, cognitive, afective, and
demographic variables. Table 2 lists all the variables included
in this dataset.
2.2. Analysis procedure</p>
      <p>Attribute
Total.Time (TT)
Stay.Pag.Relv (SR)
Stay.Pag.NonRelv(SnR)
Query.Time (QT)
Count.Queries (CQ)
Q.Mod (QM)
Q.Entropy (QE)
Total.Cover (TC)
Usf.Cover (UC)
Relv.Coverage (RC)
Clicks.Relv (CR)
Clicks.NonRelv (CnR)
Mouse.Mov.Relv
(MR)</p>
      <p>Description
Behavior (during the session)</p>
      <p>Segment total time
Dwell time in relevant pages
Dwell time in non-relevant pages
Query writing time
Number of queries
Number of query modifications
Average query entropy
Total coverage
Useful coverage (dwell time ≥ 30 seconds)
Number of relevant pages visited
Number of clicks within relevant pages
Number of clicks within non-relevant pages
Number of mouse movements
within relevant pages
Mouse.Mov.NonRelv(MnR) Nwuitmhibnenroonf-mreoleuvsaenmtpoavegmesents
Scroll.Mov.Relv(SMR) Number of scrolls within relevant pages
Scroll.Mov.NonRelv(SMnR) Number of scrolls within non-relevant pages</p>
      <p>Demographic
Sex Girl, Boy</p>
      <p>
        Afective (SAM-based scale [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ])
Pos Valence (Positive - Negative scale)
Cal Activation (Calm - excited scale)
      </p>
      <p>Cognitive(Survey)
Prior knowledge on task topic (1 to 5 scale)
Perceived task dificulty level (1 to 5 scale)
Pass (A), Fail (R)
Our general approach to evaluate the feasibility of
predicting search performance focuses on four moments
within students’ search sessions: early (25%), middle Prior.Knowledge (PK)
(50%), late (75%), and close-to-end (90%). Based on this Perceived.Dificulty (PD)
nominal division, we aim to compare diferent mod- class
els in the classification task of whether students will
fail or succeed in the overall search task (i.e., binary
classification).</p>
      <p>
        To determine whether a student failed or succeeded
in the search tasks, we relied on search score, a
processbased measure defined in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This measure accounts
for both, the success in finding relevant documents
and mistakes made during the search process. Since
search scores range from 0 to 5, we defined a threshold
of 3.3 to balance the data. This value was set to keep a
slightly balanced dataset of pass/fail cases. Thus,
students with a score of 3.3 or higher were labeled as Pass Figure 1: Subset generation based on normalized search
(46%), whereas those below this threshold were labeled sessions.
as Fail (54%).
      </p>
      <p>Following, we normalized search sessions, which
lasted a maximum of 8 minutes. Normalization was session (See Figure 1).
necessary to have all sessions in a common duration We followed the Knowledge Discovery in Data bases
scale, which were now expressed from 0% to 100%. (KDD) process with each dataset, thus we performed
Next, we proceeded to generate four additional subsets data selection, preprocessing, transformation, data
minof sessions based on the four moments stated above. ing, and evaluation/interpretation to derive knowledge.
As a result, the first set contains session data of each To implement these stages, we used both Weka and R.
student from 0% to 25%, the second set comprised data After preprocessing data, we ended up with a
tofrom 0% to 50%, and so forth. Each subset contained tal of 660 full search sessions. For the purpose of this
the Pass or Fail label computed at 100% of each search study, we discarded incomplete sessions (due to
connection problems) and those with corrupted data. These
problems were mainly caused by connection problems
or incompatibility of browsers with NEURONE.</p>
      <p>Once features were selected, preprocessed, and trans- random forest, multilayer perceptron, SMO RBF
kerformed, we created vectors of features containing ag- nel, and SMO poly kernel. All models were trained and
gregated session data (mostly behavioral) until the cor- tested through 10-fold cross-validation. The classes in
responding interval (i.e., 25%, 50%, 75%, 90%). In ad- all cases were linked to the Pass/Fail labels computed
dition, these vectors contained prior-session features at 100%, hence our classifiers were actually prediction
from demographic, cognitive, and afective variables. models attempting to determine the overall search
perFinally, Pass/Fail labels (i.e., class) were added. Over- formance of students. Results were compared in terms
all, our vectors contained 21 features plus the class. of precision, F-Measure, number of attributes, and area</p>
      <p>With these vectors, we proceeded to identify promi- under the ROC curve (AUC). A summary of the best
nent features and build binary classifiers through dif- results achieved at each time point (in terms of AUC)
ferent algorithms and approaches. Results achieved by is presented in Table 4.
these classifier in the task of determining the pass/fail
labels are presented in the following section.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Discussion</title>
      <p>
        3. Results As illustrated in Table 4, diferent models, with
diferent set of features achieved the highest AUC at
diferAfter building vectors in each subset, we ran auto- ent time points. At an early stage of students’ search
matic attribute evaluation in order to determine which processes (i,e., 25%), our best model is based on
linfeatures could contribute the most to the classification ear regression over 11 features with an AUC of 0.736
task. This procedure was conducted using two Weka and an error of 30%. Then, at 50% of search sessions,
algorithms, namely, CFSSubsetEval and InfoGainAt- the best model is also based on linear regression,
howtributeEval. As a result of this procedure, eight groups ever the set of features is slightly diferent and
perof features were identified, two per subset, as shown formance increases in 4.6% in terms of AUC. Later on,
in Table 3. Additionally, we performed attribute scan- at 75% of search progress, the best model is based on
ning, which led us to discard or include other features random forest over six features. In this case,
perforin all four subsets. On the one hand we discarded vari- mance in terms of AUC shows an increment of 12.36%
ables related to clicks in relevant and non-relevant pa- with respect to our early-stage best model. Also, a
reges since they did not improve nor worsen classifica- duction in error by almost 7% is noted. Finally, very
tion performance. In other words, their presence in- late at students’ search sessions (i.e., 90%), the best
creased problem dimensionality in terms of features model is based on logistic regression over 10 features.
unnecessarily. On the other hand, we included cog- In this case, AUC is 0.866, whereas error was reduced
nitive measures (i.e., prior knowledge and perceived to 19.55%.
task dificulty) and an afective measure (Pos) as input In this group there are features involving time spent
variables to the search process [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. in relevant and non-relevant pages, query-related
fea
      </p>
      <p>Next, by combining the selected features (those in tures, document coverage, and mouse movements, to
Table 3 and positivity score (Pos)) following a brute- name a few. In addition, we highlight that sex (i.e., a
force approach, we built classifiers through linear re- demographic feature) appears as a prominent feature
gression, logistic regression, Naïve Bayes, JRIP, J48, used by our best performing models at 25%, 50%, and
75%. Additionally, an afective feature (Pos, which
express valence in a negative-positive scale) was present
in the best performing model at 25%. Likewise, prior
knowledge on the topic (PK) and perceived task
dififculty (PD) are used in the best performing model at
50%. We note that these particular input features, which
are captured before search sessions start, seem to play
some role in the way search processes are carried out.</p>
      <p>
        On the one hand, the fact that sex appears in three out
of four models (Table 4), indicates that girls and boys
may exhibit particular search patterns that could be
linked to search performance. On the other hand, the
presence of an afective feature (i.e., Pos) also supports
the idea that searchers’ initial afective states may shape
their search behaviors and their relevance assessments
(e.g., participants in negative states being more
systematic than those in positive states) [
        <xref ref-type="bibr" rid="ref9">10, 9</xref>
        ].
      </p>
      <p>As expected, the earlier in the search process, the
higher the level of uncertainty to correctly predict the
overall search performance. On the contrary, the later
in the search process, the higher the level of certainty
to determine whether students will succeed or fail once
search sessions were completed. Despite the
low-performance of classification models at 25%, this shed light
that, to some extent, it is possible to timely predict
students’ search performance. More interestingly, our
best model is rather simple and it relies on variables
that can be captured easily in controlled and open
environments (e.g., mouse actions, query formulation
features, some demographic data).</p>
      <p>As for limitations of our prediction approach, the
fact it is based on aggregated data at diferent moments
of students’ search leads to data loss. Indeed, the
history of students’ actions while searching for
information (e.g., query formulation, page visit, scrolling
actions, query reformulation, bookmarking, etc.) is
compressed into single measures (e.g., means, sums, counts).</p>
      <p>Such chain of actions could be crucial to anticipate
how students will perform in the short and long term.</p>
      <p>In this sense, our future work will concentrate in
studying prediction approaches that take into account the
dynamics of search behaviors. Among these approaches
we consider Markovian models and SVM with
stringbased kernels.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bigdeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Haeusler</surname>
          </string-name>
          ,
          <article-title>Developing information literacy skills of the 6th grade students using the big 6 model</article-title>
          ,
          <source>Malaysian Journal of Library &amp; Information Science</source>
          <volume>23</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Foo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Appraising information literacy skills of students in singapore</article-title>
          ,
          <source>Aslib Journal of Information Management</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>A study of digital media literacy of the 5th and 6th grade primary students in beijing</article-title>
          ,
          <source>The Asia-Pacific Education Researcher</source>
          <volume>25</volume>
          (
          <year>2016</year>
          )
          <fpage>579</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ventura</surname>
          </string-name>
          ,
          <article-title>Educational data mining: a review of the state of the art</article-title>
          ,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews)
          <volume>40</volume>
          (
          <year>2010</year>
          )
          <fpage>601</fpage>
          -
          <lpage>618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikkila-Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sormunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Quintanilla</surname>
          </string-name>
          , R. GonzálezIbáñez, P. Leppanen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vauras</surname>
          </string-name>
          ,
          <article-title>A comparative study on learning and teaching online inquiry skills in finland and chile</article-title>
          ,
          <source>in: European Conference on Information Literacy (ECIL)</source>
          , volume
          <volume>18</volume>
          ,
          <year>2017</year>
          , p.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>González-Ibáñez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gacitúa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sormunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiili</surname>
          </string-name>
          ,
          <article-title>Neurone: online inquiry experimentation system</article-title>
          ,
          <source>Proceedings of the Association for Information Science and Technology</source>
          <volume>54</volume>
          (
          <year>2017</year>
          )
          <fpage>687</fpage>
          -
          <lpage>689</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sormunen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>González-Ibáñez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Leppänen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikkilä-Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Escobar-Macaya</surname>
          </string-name>
          ,
          <article-title>A performance-based test for assessing students' online inquiry competences in schools</article-title>
          ,
          <source>in: European Conference on Information Literacy</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>673</fpage>
          -
          <lpage>682</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <article-title>Measuring emotion: the selfassessment manikin and the semantic diferential</article-title>
          ,
          <source>Journal of behavior therapy and experimental psychiatry 25</source>
          (
          <year>1994</year>
          )
          <fpage>49</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>González-Ibáñez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Performance efects 4.0.1. Acknowledgment of positive and negative afective states in a colThe work described in this article was partially sup- laborative information seeking task, in: CYTEDported by the TUTELAGE project funded by the Na-</article-title>
          RITOS International Workshop on Groupware,
          <source>tional Agency for Research and Development (ANID) Springer</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>168</lpage>
          . (FONDECYT Regular, grant no.
          <volume>1201610</volume>
          ); the Vicer- [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinclair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mark</surname>
          </string-name>
          ,
          <article-title>The efects of mood state rectoría de Postgrado of the Universidad de Santiago on judgemental accuracy: Processing strategy as de Chile; and the iFuCo project funded by the Academy a mechanism</article-title>
          ,
          <source>Cognition &amp; Emotion</source>
          <volume>9</volume>
          (
          <year>1995</year>
          )
          <article-title>417- of Finland (grant no. 294186) and ANID (grant no. 438.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>