Towards Observable Indicators of Learning on Search Jacek Gwizdka Xueshu Chen School of Information School of Information University of Texas at Austin University of Texas at Austin sal2016@gwizdka.com xueshu_chen@utexas.edu ABSTRACT data analysis will serve as only a simple illustration of our On an example of a recently conducted user study, we discuss approach, while our choices of measures will inspire a discussion. assessment of learning on search as well its correlates in search behaviors and associated eye-tracking measures. Since we are reporting on a work in progress, the study is meant to illustrate our approach and our choices of measures to inspire a discussion. Keywords Information Search, Search as Learning, Eye-tracking, Measurement, Learning Assessment. 1. INTRODUCTION & BACKGROUND From its very origins information science has been concerned with ways and means of storing knowledge, organizing it, as well as helping people to find, use, and learn from it. Theorists of information science conceptually linked information interaction processes with the states of human knowledge (e.g., Belkin’s ASK Figure 1. Cognitive, Skill-based, and Affective Theory of [2] and Dervin’s sense-making [3] and information seeking was Learning Outcomes (CSALO) Model. Source: [1]0] described as “a process, in which humans purposefully engage in order to change their state of knowledge.” [4]. Yet in spite of the long established relationship with learning, only a few empirical 2. METHOD studies exist that focus on search as learning (e.g [5–7]). A lab-based experiment was conducted in the Information However, with the recent special journal issue [8], and with this eXperience lab at University of Texas at Austin (N=30). Data is and two earlier workshops, we observe an increased interest in reported here for 26 of these subjects (16 females; mean age of all this topic. One research challenge identified at a previous participants 24.5). Participants who volunteered after seeing the workshop [9] was how to assess learning in the context of recruitment notice posted at the university bulletin were pre- purposeful information seeking. This is where we aim to screened for their English native level, eye-sight, and topic contribute through this project. This short paper presents our familiarity. All participants reported daily Internet use longer than approach, method, and initial data analysis. an hour and everyday Google usage. Most have been searching Our working definition of learning is any change in person’s online for 7 years or more. The majority also considered knowledge structures. We consider that learning can take place at themselves as proficient in online information search. To many levels [10] and we are particularly influenced by the understand how people seek heath information using the Internet cognitive, skill-based, and affective theory of learning outcomes and acquire new domain knowledge, we asked each participant to (CSALO) model [1]. This framework contains elements related to perform three information search tasks (two assigned multi- searching to learn (e.g., declarative knowledge) as well as learning faceted tasks and one self-generated) on health-related topics in to search (strategies, tactics, procedural knowledge). According to counterbalanced order (six rotations), plus one training task. The this model learning outcomes are partially reflected in changes in assigned search tasks followed a simulated work task approach verbal knowledge, knowledge organization, and cognitive that triggers a realistic information need for participants as they strategies. We are particularly interested in assessing changes in were asked to find useful information for answering the task verbal knowledge. questions [14] (Table 1). Our prior work [11–13] has demonstrated feasibility of using eye- Table 1. Search tasks. tracking to detect relationship between eye movement and Assigned tasks knowledge levels. The method takes advantage of a direct Task 1–Vitamin A: Your teenage cousin has asked your advice in relationship between eye movement patterns and cognitive regard to taking vitamin A for health improvement purposes. You have processes. One goal of the project presented in this short paper is heard conflicting reports about the effects of vitamin A, and you want to to connect eye-tracking measures and traditional IR measures explore this topic in order to help your cousin. Specifically, you want to (e.g., number and kind of query reformulations) with measures of know: 1) What is the recommended dosage of vitamin A for learning. Since we are reporting on work in progress, the initial underweight teenagers? 2) What are the health benefits of taking vitamin A? Please find at least 3 benefits and 3 disadvantages of vitamin A. 3) What are the consequences of vitamin A deficiency or excess? Please Search as Learning (SAL), July 21, 2016, Pisa, Italy find 3 consequences of vitamin A deficiency and 3 consequences of its excess. The copyright for this paper remains with its authors. Copying permitted 4) Please find at least 3 food items that are considered as good sources for private and academic purposes. of vitamin A. Task 2–Hypotension: . Your friend has hypotension. You are curious assessment were from their memory. In addition, we collected a about this issue and want to investigate more. Specifically, you want to list of keywords and phrases on the assigned task topics from know: 1) What are the causes of hypotension? crowd workers on Amazon Mechanical Turk. We plan to use it in 2) What are the consequences of hypotension? assessing participant knowledge by applying automated scoring 3) What are the differences between hypotension and hypertension in and calculating semantic similarity using (e.g., using LSA). terms of symptoms? Please find at least 3 differences in symptoms between them. Table 2. Dependent measures 4) What are some medical treatments for hypotension? Which solution would you recommend to your friend if he/she also has a heart Construct Operationalization condition? Why? Knowledge difference in the number of items entered after and Example self-generated tasks gain before each task (absolute and ratio) Ex.1. Chrohn's disease- I know someone who was recently diagnosed mean frequency of nouns after a task; normalized by and am curious about the disease. the number of nouns used Ex.2. My friend has lupus. What are the symptoms for lupus? What are ratio of the mean frequency of nouns after to before the long-term consequences of lupus including the life expectancy? Are a task; normalized by the number of nouns used Expertise there any cures? What treatments are available? mean frequency of new nouns used after a task; gain normalized by the number of nouns used Participants searched publicly available web pages using Google mean rank of nouns listed after a task and were asked to save relevant web pages with their typewritten mean rank of new nouns listed after a task notes and/or information copied/pasted from the source. While there was no time limit, each user session typically lasted from 1.5 The methods we used in assessing knowledge included, for to 2 hours. Each participant completed an eHEALS questionnaire, example, statement counting [16], word analysis (e.g. word a Pre- and a Post-task Questionnaire, a Post-Search Interview on frequency, in particular for nouns), while we plan to use more how they arrived at their solutions for one of the saved web pages sophisticated methods in the future (e.g., topic analysis [17] and per task, and an Exit Questionnaire. During search in the semantic analysis). The methods aim at assessing knowledge gain experiment, all of the participants’ interactions with the computer and expertise gain. With increasing expertise, people use more system, including eye gaze, brain activity recordings (frontal sophisticated vocabulary. This sophistication is expressed in the area), facial expressions (web cam), were recorded. Eye tracking use of less frequent and more specialized vocabulary, hence our data was collected using a Tobii TX-300 eye-tracker. Participant use of word usage frequency (and word usage rank) as one of the brain wave levels were recorded using a wireless, consumer-level dependent measures. We used word frequencies and ranks of 1/3 device headset (MyndWave). At the completion of a session, each million of most frequent words taken from Google Web Trillion participant received $25. Word Corpus [18] as described by Norvig in chapter 14 in [19]. Both the Pre and Post-Task Questionnaires contained two parts: knowledge assessments and interest in a search topic. In 3. RESULTS knowledge assessments, participants were asked to list as many The mean frequencies and ranks of nouns entered before and after words or phrases as they can on the topic of a search task with no a task differed significantly (Mann-Whitney non-parametric test time limit. As we have just recently finished the study, we focus statistic=229728.5, p=0.0026; Figure 2). on participants’ responses to the free recall test to identify knowledge gains through information seeking and relate them to basic behavioral measures on Web search, adding eye fixation durations and counts. 2.1 Measures Our goals include measuring verbal and concept learning on the search process. We want to measure the difference in participant's knowledge of a search topic before and after each task, hence we need two measurement points. We considered a number of different possibilities of assessing participant’s knowledge level on the task topics. We briefly present our deliberations. Fact- checking questions before a task were considered inappropriate, because we wanted to avoid exposing participant to the topic's content before they start the search. Since the tasks were conducted on an open web, we could not use a technique such as Sentence Verification Technique (SVT) [15], which requires creation of questions for each document. Our participants were Figure 2. Mean ranks of nouns in Pre-, Post-task, and new not experts on the topics, hence concept maps and mind-mapping nouns Post-task. were deemed inappropriate as it is particularly difficult to score for non-experts. We performed linear regression with the independent variables presented in Table 3 and one dependent variable at a time (Table We decided on asking participants to list words and phrases 2) – thus, we run four regressions. Three of the obtained models related to each task topic before and after each task. Participants (except for ratio of frequencies after and before a task) were were also asked to annotate relevant web pages and to create from these annotations final notes for each task. Participant entered the significant. However, the values of R2 were modest and ranged from 0.24 to 0.28. annotations while they were on content web pages, whereas the listed words and phrases on pre- and post-task knowledge Table 3. Independent measures [2] Belkin, N.J. 1980. Anomalous states of knowledge as a Measure basis for information retrieval. Canadian Journal of Measure Information Science. 5, (1980), 133–143. Category Task level Time on task [3] Dervin, B. 1992. From the mind’s eye of the user: The Query count sense-making qualitative-quantitative methodology. Query Qualitative Research in Information Management. Query length (1992), 61–84. Number of pages visited [4] Marchionini, G. 1997. Information Seeking in Electronic Time on a page Environments. Cambridge University Press. Content Web Total fixation duration [5] Jansen, B.J. et al. 2009. Using the taxonomy of cognitive pages Count of fixations learning to model online searching. Information Processing & Management. 45, 6 (Nov. 2009), 643–663. Proportion of reading fixations [6] Wilson, M.J. and Wilson, M.L. 2013. A comparison of Proportion of durations of reading fixations techniques for measuring sensemaking and learning SERPs Number of SERPs visited within participant-generated summaries. Journal of the American Society for Information Science and The significant predictors included, 1) number of queries entered Technology. 64, 2 (2013), 291–306. and number of SERPs visited in a model with ratio of the number [7] Collins-Thompson, K. et al. 2016. Assessing Learning of items entered after and before each task as the dependent Outcomes in Web Search: A Comparison of Tasks and variable, and 2) average query length in models with the mean Query Strategies. Proceedings of the 2016 ACM on frequency of use of nouns (or new nouns) after a task as the Conference on Human Information Interaction and dependent variable. Retrieval (New York, NY, USA, 2016), 163–172. [8] Hansen, P. and Rieh, S.Y. 2016. Editorial Recent A plausible interpretation could be that the more queries are advances on searching as learning: An introduction to the issued the more items are entered in the post task knowledge list, special issue. Journal of Information Science. 42, 1 (Feb. and that there is a trade-off with the number of SERPs, namely, 2016), 3–6. with more SERPs visited number of items entered decreases. [9] Freund, L. et al. 2013. From Searching to Learning. For the second and third one, the interpretation is less exciting as Evaluation Methodologies in Information Retrieval. M. it seems to indicate that the longer the average query is the higher Agosti et al., eds. 102–105. the normalized frequency of nouns or new nouns entered in post- [10] Anderson, L.W. et al. 2001. A taxonomy for learning, task knowledge assessment. teaching, and assessing: a revision of Bloom’s taxonomy of educational objectives. Longman. The eye-tracking variables were not found to be significant [11] Cole, M.J. et al. 2011. Dynamic assessment of contributors to the dependent variables of interest. This, perhaps, information acquisition effort during interactive search. should not be surprising as they were obtained for all visits to Proceedings of the American Society for Information content pages without further differentiation of page content of Science and Technology. 48, 1 (2011), 1–10. search task phase. We plan to use more specific eye-tracking [12] Cole, M.J. et al. 2013. Inferring user knowledge level measure in our future work. from eye movement patterns. Information Processing & Management. 49, 5 (Sep. 2013), 1075–1091. 4. DISCUSSION AND CONCLUSIONS [13] Cole, M.J. et al. 2011. Task and user effects on reading We reported on our work-in-progress, in which we seek to make a patterns in information search. Interacting with methodological contribution. The results generally indicate a Computers. 23, 4 (Jul. 2011), 346–362. feasibility of the proposed approach, which we may take as an [14] Borlund, P. 2003. The IIR evaluation model: A early indication of some success. However, the relative simplicity framework for evaluation of interactive information of employed measures leaves room for improvement and, as retrieval systems. Information Research. 8, 3 (2003), indicated throughout the paper, we plan on using more paper no. 152. sophisticated assessment techniques. [15] Freund, L. et al. 2016. The effects of textual environment The broader impact of implicit detection of gains in a person’s on reading comprehension: Implications for searching as knowledge and, thus, of learning, lies in its applicability not only learning. Journal of Information Science. 42, 1 (Feb. to the design of search systems and to improving understanding of 2016), 79–93. human-information interaction but also to a wide variety of [16] Wilson, M.L. and schraefel, m c 2008. A Validated information systems, including online learning and intelligent Framework for Measuring Interface Support for tutoring systems. Interactive Information Seeking. (2008). [17] Kammerer, Y. et al. 2009. Signpost from the Masses: 5. ACKNOWLEDGMENTS Learning Effects in an Exploratory Social Tag Search This project has been funded in part by IMLS Career award #RE- Browser. Proceedings of the SIGCHI Conference on 04-11-0062-11 and in part by a fellowship from School of Human Factors in Computing Systems (New York, NY, Information to Jacek Gwizdka. USA, 2009), 625–634. [18] Brants, T. and Franz, A. All Our N-gram are Belong to 6. REFERENCES You. Research Blog. [1] K. Kraiger, J.K.F. 1993. Application of cognitive, skill- [19] Segaran, T. and Hammerbacher, J. 2009. Beautiful Data: based, and affective theories of learning outcomes to new The Stories Behind Elegant Data Solutions. O’Reilly methods of training evaluation. Journal of Applied Media, Inc. Psychology. 78, (1993), 311–328.