Towards Observable Indicators of Learning on Search
                           Jacek Gwizdka                                                           Xueshu Chen
                       School of Information                                                 School of Information
                    University of Texas at Austin                                         University of Texas at Austin
                      sal2016@gwizdka.com                                                  xueshu_chen@utexas.edu


ABSTRACT                                                                   data analysis will serve as only a simple illustration of our
On an example of a recently conducted user study, we discuss               approach, while our choices of measures will inspire a discussion.
assessment of learning on search as well its correlates in search
behaviors and associated eye-tracking measures. Since we are
reporting on a work in progress, the study is meant to illustrate
our approach and our choices of measures to inspire a discussion.

Keywords
Information Search, Search as                Learning,   Eye-tracking,
Measurement, Learning Assessment.

1. INTRODUCTION & BACKGROUND
From its very origins information science has been concerned
with ways and means of storing knowledge, organizing it, as well
as helping people to find, use, and learn from it. Theorists of
information science conceptually linked information interaction
processes with the states of human knowledge (e.g., Belkin’s ASK              Figure 1. Cognitive, Skill-based, and Affective Theory of
[2] and Dervin’s sense-making [3] and information seeking was                   Learning Outcomes (CSALO) Model. Source: [1]0]
described as “a process, in which humans purposefully engage in
order to change their state of knowledge.” [4]. Yet in spite of the
long established relationship with learning, only a few empirical          2. METHOD
studies exist that focus on search as learning (e.g [5–7]).                A lab-based experiment was conducted in the Information
However, with the recent special journal issue [8], and with this          eXperience lab at University of Texas at Austin (N=30). Data is
and two earlier workshops, we observe an increased interest in             reported here for 26 of these subjects (16 females; mean age of all
this topic. One research challenge identified at a previous                participants 24.5). Participants who volunteered after seeing the
workshop [9] was how to assess learning in the context of                  recruitment notice posted at the university bulletin were pre-
purposeful information seeking. This is where we aim to                    screened for their English native level, eye-sight, and topic
contribute through this project. This short paper presents our             familiarity. All participants reported daily Internet use longer than
approach, method, and initial data analysis.                               an hour and everyday Google usage. Most have been searching
Our working definition of learning is any change in person’s               online for 7 years or more. The majority also considered
knowledge structures. We consider that learning can take place at          themselves as proficient in online information search. To
many levels [10] and we are particularly influenced by the                 understand how people seek heath information using the Internet
cognitive, skill-based, and affective theory of learning outcomes          and acquire new domain knowledge, we asked each participant to
(CSALO) model [1]. This framework contains elements related to             perform three information search tasks (two assigned multi-
searching to learn (e.g., declarative knowledge) as well as learning       faceted tasks and one self-generated) on health-related topics in
to search (strategies, tactics, procedural knowledge). According to        counterbalanced order (six rotations), plus one training task. The
this model learning outcomes are partially reflected in changes in         assigned search tasks followed a simulated work task approach
verbal knowledge, knowledge organization, and cognitive                    that triggers a realistic information need for participants as they
strategies. We are particularly interested in assessing changes in         were asked to find useful information for answering the task
verbal knowledge.                                                          questions [14] (Table 1).

Our prior work [11–13] has demonstrated feasibility of using eye-                                  Table 1. Search tasks.
tracking to detect relationship between eye movement and                                          Assigned tasks
knowledge levels. The method takes advantage of a direct                    Task 1–Vitamin A: Your teenage cousin has asked your advice in
relationship between eye movement patterns and cognitive                    regard to taking vitamin A for health improvement purposes. You have
processes. One goal of the project presented in this short paper is         heard conflicting reports about the effects of vitamin A, and you want to
to connect eye-tracking measures and traditional IR measures                explore this topic in order to help your cousin. Specifically, you want to
(e.g., number and kind of query reformulations) with measures of            know: 1) What is the recommended dosage of vitamin A for
learning. Since we are reporting on work in progress, the initial           underweight teenagers?
                                                                            2) What are the health benefits of taking vitamin A? Please find at least
                                                                            3 benefits and 3 disadvantages of vitamin A.
                                                                            3) What are the consequences of vitamin A deficiency or excess? Please
Search as Learning (SAL), July 21, 2016, Pisa, Italy                        find 3 consequences of vitamin A deficiency and 3 consequences of its
                                                                            excess.
The copyright for this paper remains with its authors. Copying permitted    4) Please find at least 3 food items that are considered as good sources
for private and academic purposes.                                          of vitamin A.
 Task 2–Hypotension: . Your friend has hypotension. You are curious         assessment were from their memory. In addition, we collected a
 about this issue and want to investigate more. Specifically, you want to   list of keywords and phrases on the assigned task topics from
 know: 1) What are the causes of hypotension?                               crowd workers on Amazon Mechanical Turk. We plan to use it in
 2) What are the consequences of hypotension?                               assessing participant knowledge by applying automated scoring
 3) What are the differences between hypotension and hypertension in        and calculating semantic similarity using (e.g., using LSA).
 terms of symptoms? Please find at least 3 differences in symptoms
 between them.                                                                               Table 2. Dependent measures
 4) What are some medical treatments for hypotension? Which solution
 would you recommend to your friend if he/she also has a heart               Construct                   Operationalization
 condition? Why?                                                             Knowledge      difference in the number of items entered after and
                   Example self-generated tasks                                gain                 before each task (absolute and ratio)
 Ex.1. Chrohn's disease- I know someone who was recently diagnosed                         mean frequency of nouns after a task; normalized by
 and am curious about the disease.                                                                       the number of nouns used
 Ex.2. My friend has lupus. What are the symptoms for lupus? What are                      ratio of the mean frequency of nouns after to before
 the long-term consequences of lupus including the life expectancy? Are                      a task; normalized by the number of nouns used
                                                                              Expertise
 there any cures? What treatments are available?                                             mean frequency of new nouns used after a task;
                                                                                gain
                                                                                                 normalized by the number of nouns used
Participants searched publicly available web pages using Google                                   mean rank of nouns listed after a task
and were asked to save relevant web pages with their typewritten
                                                                                                mean rank of new nouns listed after a task
notes and/or information copied/pasted from the source. While
there was no time limit, each user session typically lasted from 1.5        The methods we used in assessing knowledge included, for
to 2 hours. Each participant completed an eHEALS questionnaire,             example, statement counting [16], word analysis (e.g. word
a Pre- and a Post-task Questionnaire, a Post-Search Interview on            frequency, in particular for nouns), while we plan to use more
how they arrived at their solutions for one of the saved web pages          sophisticated methods in the future (e.g., topic analysis [17] and
per task, and an Exit Questionnaire. During search in the                   semantic analysis). The methods aim at assessing knowledge gain
experiment, all of the participants’ interactions with the computer         and expertise gain. With increasing expertise, people use more
system, including eye gaze, brain activity recordings (frontal              sophisticated vocabulary. This sophistication is expressed in the
area), facial expressions (web cam), were recorded. Eye tracking            use of less frequent and more specialized vocabulary, hence our
data was collected using a Tobii TX-300 eye-tracker. Participant            use of word usage frequency (and word usage rank) as one of the
brain wave levels were recorded using a wireless, consumer-level            dependent measures. We used word frequencies and ranks of 1/3
device headset (MyndWave). At the completion of a session, each             million of most frequent words taken from Google Web Trillion
participant received $25.                                                   Word Corpus [18] as described by Norvig in chapter 14 in [19].
Both the Pre and Post-Task Questionnaires contained two parts:
knowledge assessments and interest in a search topic. In                    3. RESULTS
knowledge assessments, participants were asked to list as many              The mean frequencies and ranks of nouns entered before and after
words or phrases as they can on the topic of a search task with no          a task differed significantly (Mann-Whitney non-parametric test
time limit. As we have just recently finished the study, we focus           statistic=229728.5, p=0.0026; Figure 2).
on participants’ responses to the free recall test to identify
knowledge gains through information seeking and relate them to
basic behavioral measures on Web search, adding eye fixation
durations and counts.

2.1 Measures
Our goals include measuring verbal and concept learning on the
search process. We want to measure the difference in participant's
knowledge of a search topic before and after each task, hence we
need two measurement points. We considered a number of
different possibilities of assessing participant’s knowledge level
on the task topics. We briefly present our deliberations. Fact-
checking questions before a task were considered inappropriate,
because we wanted to avoid exposing participant to the topic's
content before they start the search. Since the tasks were
conducted on an open web, we could not use a technique such as
Sentence Verification Technique (SVT) [15], which requires
creation of questions for each document. Our participants were
                                                                              Figure 2. Mean ranks of nouns in Pre-, Post-task, and new
not experts on the topics, hence concept maps and mind-mapping
                                                                                                 nouns Post-task.
were deemed inappropriate as it is particularly difficult to score
for non-experts.                                                            We performed linear regression with the independent variables
                                                                            presented in Table 3 and one dependent variable at a time (Table
We decided on asking participants to list words and phrases
                                                                            2) – thus, we run four regressions. Three of the obtained models
related to each task topic before and after each task. Participants
                                                                            (except for ratio of frequencies after and before a task) were
were also asked to annotate relevant web pages and to create from
these annotations final notes for each task. Participant entered the        significant. However, the values of R2 were modest and ranged
                                                                            from 0.24 to 0.28.
annotations while they were on content web pages, whereas the
listed words and phrases on pre- and post-task knowledge
                    Table 3. Independent measures                      [2]    Belkin, N.J. 1980. Anomalous states of knowledge as a
      Measure                                                                 basis for information retrieval. Canadian Journal of
                                        Measure                               Information Science. 5, (1980), 133–143.
      Category
       Task level                      Time on task                    [3]    Dervin, B. 1992. From the mind’s eye of the user: The
                                       Query count                            sense-making qualitative-quantitative methodology.
        Query                                                                 Qualitative Research in Information Management.
                                       Query length
                                                                              (1992), 61–84.
                                 Number of pages visited
                                                                       [4]    Marchionini, G. 1997. Information Seeking in Electronic
                                      Time on a page                          Environments. Cambridge University Press.
      Content Web                 Total fixation duration              [5]    Jansen, B.J. et al. 2009. Using the taxonomy of cognitive
         pages                      Count of fixations                        learning to model online searching. Information
                                                                              Processing & Management. 45, 6 (Nov. 2009), 643–663.
                              Proportion of reading fixations
                                                                       [6]    Wilson, M.J. and Wilson, M.L. 2013. A comparison of
                        Proportion of durations of reading fixations          techniques for measuring sensemaking and learning
        SERPs                   Number of SERPs visited                       within participant-generated summaries. Journal of the
                                                                              American Society for Information Science and
The significant predictors included, 1) number of queries entered             Technology. 64, 2 (2013), 291–306.
and number of SERPs visited in a model with ratio of the number        [7]    Collins-Thompson, K. et al. 2016. Assessing Learning
of items entered after and before each task as the dependent                  Outcomes in Web Search: A Comparison of Tasks and
variable, and 2) average query length in models with the mean                 Query Strategies. Proceedings of the 2016 ACM on
frequency of use of nouns (or new nouns) after a task as the                  Conference on Human Information Interaction and
dependent variable.                                                           Retrieval (New York, NY, USA, 2016), 163–172.
                                                                       [8]    Hansen, P. and Rieh, S.Y. 2016. Editorial Recent
A plausible interpretation could be that the more queries are                 advances on searching as learning: An introduction to the
issued the more items are entered in the post task knowledge list,            special issue. Journal of Information Science. 42, 1 (Feb.
and that there is a trade-off with the number of SERPs, namely,               2016), 3–6.
with more SERPs visited number of items entered decreases.             [9]    Freund, L. et al. 2013. From Searching to Learning.
For the second and third one, the interpretation is less exciting as          Evaluation Methodologies in Information Retrieval. M.
it seems to indicate that the longer the average query is the higher          Agosti et al., eds. 102–105.
the normalized frequency of nouns or new nouns entered in post-        [10]   Anderson, L.W. et al. 2001. A taxonomy for learning,
task knowledge assessment.                                                    teaching, and assessing: a revision of Bloom’s taxonomy
                                                                              of educational objectives. Longman.
The eye-tracking variables were not found to be significant            [11]   Cole, M.J. et al. 2011. Dynamic assessment of
contributors to the dependent variables of interest. This, perhaps,           information acquisition effort during interactive search.
should not be surprising as they were obtained for all visits to              Proceedings of the American Society for Information
content pages without further differentiation of page content of              Science and Technology. 48, 1 (2011), 1–10.
search task phase. We plan to use more specific eye-tracking           [12]   Cole, M.J. et al. 2013. Inferring user knowledge level
measure in our future work.                                                   from eye movement patterns. Information Processing &
                                                                              Management. 49, 5 (Sep. 2013), 1075–1091.
4. DISCUSSION AND CONCLUSIONS                                          [13]   Cole, M.J. et al. 2011. Task and user effects on reading
We reported on our work-in-progress, in which we seek to make a               patterns in information search. Interacting with
methodological contribution. The results generally indicate a                 Computers. 23, 4 (Jul. 2011), 346–362.
feasibility of the proposed approach, which we may take as an          [14]   Borlund, P. 2003. The IIR evaluation model: A
early indication of some success. However, the relative simplicity            framework for evaluation of interactive information
of employed measures leaves room for improvement and, as                      retrieval systems. Information Research. 8, 3 (2003),
indicated throughout the paper, we plan on using more                         paper no. 152.
sophisticated assessment techniques.                                   [15]   Freund, L. et al. 2016. The effects of textual environment
The broader impact of implicit detection of gains in a person’s               on reading comprehension: Implications for searching as
knowledge and, thus, of learning, lies in its applicability not only          learning. Journal of Information Science. 42, 1 (Feb.
to the design of search systems and to improving understanding of             2016), 79–93.
human-information interaction but also to a wide variety of            [16]   Wilson, M.L. and schraefel, m c 2008. A Validated
information systems, including online learning and intelligent                Framework for Measuring Interface Support for
tutoring systems.                                                             Interactive Information Seeking. (2008).
                                                                       [17]   Kammerer, Y. et al. 2009. Signpost from the Masses:
5. ACKNOWLEDGMENTS                                                            Learning Effects in an Exploratory Social Tag Search
This project has been funded in part by IMLS Career award #RE-                Browser. Proceedings of the SIGCHI Conference on
04-11-0062-11 and in part by a fellowship from School of                      Human Factors in Computing Systems (New York, NY,
Information to Jacek Gwizdka.                                                 USA, 2009), 625–634.
                                                                       [18]   Brants, T. and Franz, A. All Our N-gram are Belong to
6. REFERENCES                                                                 You. Research Blog.
[1]        K. Kraiger, J.K.F. 1993. Application of cognitive, skill-   [19]   Segaran, T. and Hammerbacher, J. 2009. Beautiful Data:
           based, and affective theories of learning outcomes to new          The Stories Behind Elegant Data Solutions. O’Reilly
           methods of training evaluation. Journal of Applied                 Media, Inc.
           Psychology. 78, (1993), 311–328.