=Paper=
{{Paper
|id=Vol-1647/SAL2016_paper_19
|storemode=property
|title=
Towards Observable Indicators of Learning on Search
|pdfUrl=https://ceur-ws.org/Vol-1647/SAL2016_paper_19.pdf
|volume=Vol-1647
|authors=Jacek Gwizdka,Xueshu Chen
|dblpUrl=https://dblp.org/rec/conf/sigir/GwizdkaC16
}}
==
Towards Observable Indicators of Learning on Search
==
Towards Observable Indicators of Learning on Search
Jacek Gwizdka Xueshu Chen
School of Information School of Information
University of Texas at Austin University of Texas at Austin
sal2016@gwizdka.com xueshu_chen@utexas.edu
ABSTRACT data analysis will serve as only a simple illustration of our
On an example of a recently conducted user study, we discuss approach, while our choices of measures will inspire a discussion.
assessment of learning on search as well its correlates in search
behaviors and associated eye-tracking measures. Since we are
reporting on a work in progress, the study is meant to illustrate
our approach and our choices of measures to inspire a discussion.
Keywords
Information Search, Search as Learning, Eye-tracking,
Measurement, Learning Assessment.
1. INTRODUCTION & BACKGROUND
From its very origins information science has been concerned
with ways and means of storing knowledge, organizing it, as well
as helping people to find, use, and learn from it. Theorists of
information science conceptually linked information interaction
processes with the states of human knowledge (e.g., Belkin’s ASK Figure 1. Cognitive, Skill-based, and Affective Theory of
[2] and Dervin’s sense-making [3] and information seeking was Learning Outcomes (CSALO) Model. Source: [1]0]
described as “a process, in which humans purposefully engage in
order to change their state of knowledge.” [4]. Yet in spite of the
long established relationship with learning, only a few empirical 2. METHOD
studies exist that focus on search as learning (e.g [5–7]). A lab-based experiment was conducted in the Information
However, with the recent special journal issue [8], and with this eXperience lab at University of Texas at Austin (N=30). Data is
and two earlier workshops, we observe an increased interest in reported here for 26 of these subjects (16 females; mean age of all
this topic. One research challenge identified at a previous participants 24.5). Participants who volunteered after seeing the
workshop [9] was how to assess learning in the context of recruitment notice posted at the university bulletin were pre-
purposeful information seeking. This is where we aim to screened for their English native level, eye-sight, and topic
contribute through this project. This short paper presents our familiarity. All participants reported daily Internet use longer than
approach, method, and initial data analysis. an hour and everyday Google usage. Most have been searching
Our working definition of learning is any change in person’s online for 7 years or more. The majority also considered
knowledge structures. We consider that learning can take place at themselves as proficient in online information search. To
many levels [10] and we are particularly influenced by the understand how people seek heath information using the Internet
cognitive, skill-based, and affective theory of learning outcomes and acquire new domain knowledge, we asked each participant to
(CSALO) model [1]. This framework contains elements related to perform three information search tasks (two assigned multi-
searching to learn (e.g., declarative knowledge) as well as learning faceted tasks and one self-generated) on health-related topics in
to search (strategies, tactics, procedural knowledge). According to counterbalanced order (six rotations), plus one training task. The
this model learning outcomes are partially reflected in changes in assigned search tasks followed a simulated work task approach
verbal knowledge, knowledge organization, and cognitive that triggers a realistic information need for participants as they
strategies. We are particularly interested in assessing changes in were asked to find useful information for answering the task
verbal knowledge. questions [14] (Table 1).
Our prior work [11–13] has demonstrated feasibility of using eye- Table 1. Search tasks.
tracking to detect relationship between eye movement and Assigned tasks
knowledge levels. The method takes advantage of a direct Task 1–Vitamin A: Your teenage cousin has asked your advice in
relationship between eye movement patterns and cognitive regard to taking vitamin A for health improvement purposes. You have
processes. One goal of the project presented in this short paper is heard conflicting reports about the effects of vitamin A, and you want to
to connect eye-tracking measures and traditional IR measures explore this topic in order to help your cousin. Specifically, you want to
(e.g., number and kind of query reformulations) with measures of know: 1) What is the recommended dosage of vitamin A for
learning. Since we are reporting on work in progress, the initial underweight teenagers?
2) What are the health benefits of taking vitamin A? Please find at least
3 benefits and 3 disadvantages of vitamin A.
3) What are the consequences of vitamin A deficiency or excess? Please
Search as Learning (SAL), July 21, 2016, Pisa, Italy find 3 consequences of vitamin A deficiency and 3 consequences of its
excess.
The copyright for this paper remains with its authors. Copying permitted 4) Please find at least 3 food items that are considered as good sources
for private and academic purposes. of vitamin A.
Task 2–Hypotension: . Your friend has hypotension. You are curious assessment were from their memory. In addition, we collected a
about this issue and want to investigate more. Specifically, you want to list of keywords and phrases on the assigned task topics from
know: 1) What are the causes of hypotension? crowd workers on Amazon Mechanical Turk. We plan to use it in
2) What are the consequences of hypotension? assessing participant knowledge by applying automated scoring
3) What are the differences between hypotension and hypertension in and calculating semantic similarity using (e.g., using LSA).
terms of symptoms? Please find at least 3 differences in symptoms
between them. Table 2. Dependent measures
4) What are some medical treatments for hypotension? Which solution
would you recommend to your friend if he/she also has a heart Construct Operationalization
condition? Why? Knowledge difference in the number of items entered after and
Example self-generated tasks gain before each task (absolute and ratio)
Ex.1. Chrohn's disease- I know someone who was recently diagnosed mean frequency of nouns after a task; normalized by
and am curious about the disease. the number of nouns used
Ex.2. My friend has lupus. What are the symptoms for lupus? What are ratio of the mean frequency of nouns after to before
the long-term consequences of lupus including the life expectancy? Are a task; normalized by the number of nouns used
Expertise
there any cures? What treatments are available? mean frequency of new nouns used after a task;
gain
normalized by the number of nouns used
Participants searched publicly available web pages using Google mean rank of nouns listed after a task
and were asked to save relevant web pages with their typewritten
mean rank of new nouns listed after a task
notes and/or information copied/pasted from the source. While
there was no time limit, each user session typically lasted from 1.5 The methods we used in assessing knowledge included, for
to 2 hours. Each participant completed an eHEALS questionnaire, example, statement counting [16], word analysis (e.g. word
a Pre- and a Post-task Questionnaire, a Post-Search Interview on frequency, in particular for nouns), while we plan to use more
how they arrived at their solutions for one of the saved web pages sophisticated methods in the future (e.g., topic analysis [17] and
per task, and an Exit Questionnaire. During search in the semantic analysis). The methods aim at assessing knowledge gain
experiment, all of the participants’ interactions with the computer and expertise gain. With increasing expertise, people use more
system, including eye gaze, brain activity recordings (frontal sophisticated vocabulary. This sophistication is expressed in the
area), facial expressions (web cam), were recorded. Eye tracking use of less frequent and more specialized vocabulary, hence our
data was collected using a Tobii TX-300 eye-tracker. Participant use of word usage frequency (and word usage rank) as one of the
brain wave levels were recorded using a wireless, consumer-level dependent measures. We used word frequencies and ranks of 1/3
device headset (MyndWave). At the completion of a session, each million of most frequent words taken from Google Web Trillion
participant received $25. Word Corpus [18] as described by Norvig in chapter 14 in [19].
Both the Pre and Post-Task Questionnaires contained two parts:
knowledge assessments and interest in a search topic. In 3. RESULTS
knowledge assessments, participants were asked to list as many The mean frequencies and ranks of nouns entered before and after
words or phrases as they can on the topic of a search task with no a task differed significantly (Mann-Whitney non-parametric test
time limit. As we have just recently finished the study, we focus statistic=229728.5, p=0.0026; Figure 2).
on participants’ responses to the free recall test to identify
knowledge gains through information seeking and relate them to
basic behavioral measures on Web search, adding eye fixation
durations and counts.
2.1 Measures
Our goals include measuring verbal and concept learning on the
search process. We want to measure the difference in participant's
knowledge of a search topic before and after each task, hence we
need two measurement points. We considered a number of
different possibilities of assessing participant’s knowledge level
on the task topics. We briefly present our deliberations. Fact-
checking questions before a task were considered inappropriate,
because we wanted to avoid exposing participant to the topic's
content before they start the search. Since the tasks were
conducted on an open web, we could not use a technique such as
Sentence Verification Technique (SVT) [15], which requires
creation of questions for each document. Our participants were
Figure 2. Mean ranks of nouns in Pre-, Post-task, and new
not experts on the topics, hence concept maps and mind-mapping
nouns Post-task.
were deemed inappropriate as it is particularly difficult to score
for non-experts. We performed linear regression with the independent variables
presented in Table 3 and one dependent variable at a time (Table
We decided on asking participants to list words and phrases
2) – thus, we run four regressions. Three of the obtained models
related to each task topic before and after each task. Participants
(except for ratio of frequencies after and before a task) were
were also asked to annotate relevant web pages and to create from
these annotations final notes for each task. Participant entered the significant. However, the values of R2 were modest and ranged
from 0.24 to 0.28.
annotations while they were on content web pages, whereas the
listed words and phrases on pre- and post-task knowledge
Table 3. Independent measures [2] Belkin, N.J. 1980. Anomalous states of knowledge as a
Measure basis for information retrieval. Canadian Journal of
Measure Information Science. 5, (1980), 133–143.
Category
Task level Time on task [3] Dervin, B. 1992. From the mind’s eye of the user: The
Query count sense-making qualitative-quantitative methodology.
Query Qualitative Research in Information Management.
Query length
(1992), 61–84.
Number of pages visited
[4] Marchionini, G. 1997. Information Seeking in Electronic
Time on a page Environments. Cambridge University Press.
Content Web Total fixation duration [5] Jansen, B.J. et al. 2009. Using the taxonomy of cognitive
pages Count of fixations learning to model online searching. Information
Processing & Management. 45, 6 (Nov. 2009), 643–663.
Proportion of reading fixations
[6] Wilson, M.J. and Wilson, M.L. 2013. A comparison of
Proportion of durations of reading fixations techniques for measuring sensemaking and learning
SERPs Number of SERPs visited within participant-generated summaries. Journal of the
American Society for Information Science and
The significant predictors included, 1) number of queries entered Technology. 64, 2 (2013), 291–306.
and number of SERPs visited in a model with ratio of the number [7] Collins-Thompson, K. et al. 2016. Assessing Learning
of items entered after and before each task as the dependent Outcomes in Web Search: A Comparison of Tasks and
variable, and 2) average query length in models with the mean Query Strategies. Proceedings of the 2016 ACM on
frequency of use of nouns (or new nouns) after a task as the Conference on Human Information Interaction and
dependent variable. Retrieval (New York, NY, USA, 2016), 163–172.
[8] Hansen, P. and Rieh, S.Y. 2016. Editorial Recent
A plausible interpretation could be that the more queries are advances on searching as learning: An introduction to the
issued the more items are entered in the post task knowledge list, special issue. Journal of Information Science. 42, 1 (Feb.
and that there is a trade-off with the number of SERPs, namely, 2016), 3–6.
with more SERPs visited number of items entered decreases. [9] Freund, L. et al. 2013. From Searching to Learning.
For the second and third one, the interpretation is less exciting as Evaluation Methodologies in Information Retrieval. M.
it seems to indicate that the longer the average query is the higher Agosti et al., eds. 102–105.
the normalized frequency of nouns or new nouns entered in post- [10] Anderson, L.W. et al. 2001. A taxonomy for learning,
task knowledge assessment. teaching, and assessing: a revision of Bloom’s taxonomy
of educational objectives. Longman.
The eye-tracking variables were not found to be significant [11] Cole, M.J. et al. 2011. Dynamic assessment of
contributors to the dependent variables of interest. This, perhaps, information acquisition effort during interactive search.
should not be surprising as they were obtained for all visits to Proceedings of the American Society for Information
content pages without further differentiation of page content of Science and Technology. 48, 1 (2011), 1–10.
search task phase. We plan to use more specific eye-tracking [12] Cole, M.J. et al. 2013. Inferring user knowledge level
measure in our future work. from eye movement patterns. Information Processing &
Management. 49, 5 (Sep. 2013), 1075–1091.
4. DISCUSSION AND CONCLUSIONS [13] Cole, M.J. et al. 2011. Task and user effects on reading
We reported on our work-in-progress, in which we seek to make a patterns in information search. Interacting with
methodological contribution. The results generally indicate a Computers. 23, 4 (Jul. 2011), 346–362.
feasibility of the proposed approach, which we may take as an [14] Borlund, P. 2003. The IIR evaluation model: A
early indication of some success. However, the relative simplicity framework for evaluation of interactive information
of employed measures leaves room for improvement and, as retrieval systems. Information Research. 8, 3 (2003),
indicated throughout the paper, we plan on using more paper no. 152.
sophisticated assessment techniques. [15] Freund, L. et al. 2016. The effects of textual environment
The broader impact of implicit detection of gains in a person’s on reading comprehension: Implications for searching as
knowledge and, thus, of learning, lies in its applicability not only learning. Journal of Information Science. 42, 1 (Feb.
to the design of search systems and to improving understanding of 2016), 79–93.
human-information interaction but also to a wide variety of [16] Wilson, M.L. and schraefel, m c 2008. A Validated
information systems, including online learning and intelligent Framework for Measuring Interface Support for
tutoring systems. Interactive Information Seeking. (2008).
[17] Kammerer, Y. et al. 2009. Signpost from the Masses:
5. ACKNOWLEDGMENTS Learning Effects in an Exploratory Social Tag Search
This project has been funded in part by IMLS Career award #RE- Browser. Proceedings of the SIGCHI Conference on
04-11-0062-11 and in part by a fellowship from School of Human Factors in Computing Systems (New York, NY,
Information to Jacek Gwizdka. USA, 2009), 625–634.
[18] Brants, T. and Franz, A. All Our N-gram are Belong to
6. REFERENCES You. Research Blog.
[1] K. Kraiger, J.K.F. 1993. Application of cognitive, skill- [19] Segaran, T. and Hammerbacher, J. 2009. Beautiful Data:
based, and affective theories of learning outcomes to new The Stories Behind Elegant Data Solutions. O’Reilly
methods of training evaluation. Journal of Applied Media, Inc.
Psychology. 78, (1993), 311–328.