Finding, Understanding and Learning: Making Information Discovery Tasks Useful for Children and Teachers Ion Madrazo Azpiazu Nevena Dragovic Maria Soledad Pera Computer Science Dept. Computer Science Dept. Computer Science Dept. Boise State University Boise State University Boise State University Boise, Idaho, USA Boise, Idaho, USA Boise, Idaho, USA ionmadrazo@boisestate.edu nevenadragovic@boisestate.edu solepera@boisestate.edu ABSTRACT natural language or ambiguous queries [1]. Other prominent We present our ongoing efforts on the development of a issue is evidenced by the results of the survey conducted by search environment tailored to 6-15 year-olds that can foster Bilal et al. [1], which identifies that out of 300 retrieved re- learning though retrieval of materials that not only satisfy sults to satisfy information needs of 7 graders, only 1 matched the information needs of users but also match their reading their reading level. This is concerning since it is hard for abilities. YouUnderstood.me is an enhanced environment children to comprehend texts with readability levels that do based on a popular search engine specifically designed to help not match their own. Furthermore, given that children as students deal with search for learning tasks, and allow teach- web users, “differ widely in their reading proficiency and abil- ers to track their progress. An initial assessment conducted ity to understand vocabulary, depending on factors such as on YouUnderstood.me and well-known (children-oriented) age, educational background, and topic interest or expertise” search engines based on queries generated by K-9 students, [2], it is imperative to tailor the complexity of results to the showcases the need for this type of environment. specific needs of each child, and not just to generalize based on a label such as age or grade. As reported by Lennon and Burdick [14], reading for learning takes place when the reader CCS Concepts comprehends 75% of a text. This represents an appropriate •Information systems → Personalization; •Social and balance that allows the reader to positively understand the professional topics → Children; text, while also finding challenges in the reading process that will motivate him to improve his skills [14]. Therefore, un- Keywords less the retrieved resources match the reading skills of users, reading for learning, and learning as final goal, as a part of Search as Learning; Children; Readability; Personalization the online information seeking process cannot take place. In response to the issues that affect to the information seek- 1. INTRODUCTION ing process, we discuss our ongoing efforts to develop a web The use of Web technologies is increasingly becoming a search environment designed to help K-91 students in finding relevant and valuable asset for children’s education [12], both adequate online materials. We focus on audience comprised of because it enhances the class environment and it introduces 6 to 15 year-olds, since these ages refer to children from their children, from early stages of their lives, into today’s infor- initial search experiences through their “graduation” to adult mation society [18]. Unfortunately, as described by Danby search tools. YouUnderstood.me (YUM) aims to enhance [3], incorporating technology with more traditional activities search engines so that they can be used as a tool to facilitate into early childhood education is not a trivial task. Children learning, rather than just retrieving information. The main use search engines on a daily basis to locate materials that goal of YUM is to improve the information seeking process can help them with different academic tasks, from finding and increase children’s comprehension of retrieved materi- information for a class to discovering the meaning of a word als by combining diverse functionalities to overcome search [12]. While the use of search engines for the enhancement of engines deficiencies encountered by children. YUM makes learning tasks is very common, they are not designed with the information retrieval process effective and efficient by (i) children in mind, and thus a number of issues arise when taking advantage of readability formulas, a popular search used by this audience [8]. An important barrier is showcased engine, a search intent module, and a query recommendation by the fact that search engines are not always successful in tool as well as (ii) providing each student with a personal understanding children’s information needs, expressed in long account which keeps track of current readability level and feedback given to previously retrieved resources, enabling YUM to update the predicted reading level2 of students over time. Teachers can also benefit of YUM as they have access to the constantly evolving reading levels of students, allowing them to better adapt classes’ materials and pace. 1 K-9 refers to grades prior to high school sophomores in the Search as Learning (SAL), July 21, 2016, Pisa, Italy education systems in countries such as USA or Canada. 2 The copyright for this paper remains with its authors. Copying permitted for private We consider a reading level of a student to be the maximum and academic purposes. readability level of texts he can understand. The novelty of YUM lays on creating an environment based 3. YOUUNDESTOOD.ME on existing search engines that not only serves students in YUM is an online environment built around a search en- retrieving resources relevant to their information discovery gine, which aims to make the search process valuable for tasks, but also ensures that those resources have appropri- children. Opposed to similar environments [20], YUM is not ate reading levels to each specific user. Furthermore, YUM meant to be treated as a new, child-oriented search engine, builds a bridge to establish a direct relationship between since studies [1] show that children prefer popular search teachers and students, where teachers can follow the progress engines. Instead, YUM acts as an intermediate layer between in readability levels among the students and further foster the child and an existing search engine (Google Safe Search), the learning process. Finally we contribute with an initial to facilitate the interaction between the two of them. For study of (children-oriented) search engines conducted over a doing so, YUM puts into practice strategies oriented to ad- sample of children written queries, which will be made public dress issues children face when using popular search engines, to the research community. as well as strategies that can enhance the search experience to foster learning. A description of the mentioned strategies 2. RELATED WORK is provided below. A number of studies have targeted the issue of search Search Intent. Children tend to write natural language personalization [2, 6, 11, 21]. The authors in [6, 21] argue for queries, instead of short, keyword-based ones that search the need to personalize search results to satisfy diverse users’ engines usually expect [17] making children unable to suc- needs and preferences. However, while their personalization cessfully complete information seeking tasks. In addition, strategy focuses on parameters such as authority of web children also tend to misspell words, but not necessarily in pages or atypical search sessions, respectively, we focus on the same fashion as average users. For example, children parameters that can aid the learning acquisition process, commonly repeat letters in a word to emphasize it, such as i.e., readability levels of retrieved results. Personalization in ”faaaaast”, which can cause search engines to misunder- based on readability has also been explored [2, 11]. While stand the intended meaning of the word. YUM leverages Collins et al. [2] demonstrate, based on the results of an our previous research work QuIK [4], a search intent module extensive query-log analysis, that readability is a valuable designed for children which addresses common patterns in signal for relevance of retrieved resources, Jatowt et al. [11] each query Q written by a child including, but not limited highlight the need for suitable readability levels on resources to, diminutives, emphasis, children trendy terms or children retrieved as a result of queries on complex topic formulated specific misspellings, and transforms Q into a new keyword by non-experts. We agree on the importance of readability in query capturing the information expressed by the child in a personalizing web searches, which is why YUM is designed to way that can be easier for search engines to comprehend. present its users resources they can read and understand. Our efforts to create a search environment that addresses issues Query Suggestion. Even if a search intent module can K-9 students encounter while conducting information seeking identify the most likely intent for each query, users have tasks are further encouraged by the conclusions reported by different interests and needs, which is why when dealing with Huumerdeman and Kamps [10], who argue in favor of the ambiguous queries, it is only each specific user who knows need to connect literacy and search engines. the purpose of his search. With this is mind, YUM takes Related to search environments specifically designed for advantage of our previously-developed ReQuIK [5], a query children, the authors in [9] introduce a search user interface recommender tailored to children, and provides alternatives that takes the user’s age as a parameter for adaptation. for the initial query that the user can select to better inform Similarly, YUM focuses on adapting the search environment the search process. ReQuIK is based on a multi-criteria strat- to the needs of children, but from a reading comprehension egy that examines traits commonly associated with children standpoint, to facilitate the search-as-learning task. The and suggests queries that (i) are associated with children authors in [8], on the other side, present an adaptive search topics, (ii) lead to the retrieval of resources with levels of user interface that aims at enhancing the search process for readability matching those of the K-9 audience, and (iii) are 7-to-12 year old children. The focus of their research is on diverse. developing a new search environment. YUM instead, focuses in incorporating modular capabilities that can be applied to Filtering by Readability. Even when the search engine improve the functionality of popular search engines preferred has understood the intent of a child query and retrieves re- by children [1], in terms of the needs and expectations of sults that match the information needs expressed by users, children. The closest environment to the one we propose is the suitability of retrieved resources is still not assured. K-9 the one described in [20]. However the application proposed students find difficult to understand documents containing by Usta et al. only offers grade level filtering, which is a complex or technical vocabulary. For example, in the case constraint, since students’ reading abilities may differ even where a child is looking for information about chemistry, in a same class and improve over the time [19]. In addition, retrieving a scientific publication would not be adequate, their environment is not based on known search engines, while retrieving information from an elementary chemistry which children tend to favor [1]. book would. If the retrieved documents are too complex, While a number of search engines have been developed to children may not succeed in completing their information aid children, they are not optimal to conduct information discovery tasks. In order to avoid this situation, YUM incor- discovery tasks for learning purposes as discussed in [8] and porates a filtering strategy based on readability levels. This Section 4. To the best of our knowledge, YUM is the only strategy ensures that the retrieved documents match, to a education-oriented environment that considers readability degree, the reading ability of each individual user. YUM levels as well as queries that potentially lead to the retrieval allows users to go through a one-time process where they can of child-targeted resources to aid K-9 students in completing select their grade level, which is originally used as a target successful information seeking tasks. to eliminate resources that are not within half a grade level oriented to meet the new requirements of the current indus- above or below the grade of the corresponding student. For try, such as the Common Core State Standards Initiative estimating the readability of retrieved resources, YUM uses (CCSS), have been developed. CCSS requests educators to the Flesh-Kincaid readability formula [7]. While we expect to make an emphasis on higher level thinking during reading develop our own readability formula in the future, we initially and focus on the acquisition of skills such as research and selected this formula given that it is considered a standard comprehension using digital tools, including search engines by educators and institutions for measuring readability. [15]. Furthermore, studies showcase the benefit of in-class exercises such as exploratory talks, where students are asked Tracking. K-9 students have diverse reading abilities, which to solve a problem in groups discussing information found on can differ even in same grade class, and progressively improve resources obtained using a search engine [13]. Unfortunately, over time [19]. Consequently, a one-size-fits-all strategy is teachers might not be able to propose such a task to their not applicable for conducting successful information-seeking students and lead discussions, if students have problems us- tasks that lead to the retrieval of resources individual users ing search engines, whether they are struggling to find the can understand. YUM employs an adaptive strategy based right queries or not being able to understand the retrieved on explicit feedback that users can provide by specifying documents. YUM can help teachers overcome those issues whether the resources retrieved were “Too Easy”, “OK” or so that they can focus on the discussion, rather than the “Too complex” for them. Children might not be experts in de- manner in which students should formulate queries or the termining the readability of a document, however, YUM takes type of results they access. Furthermore, YUM can serve advantage of their perception over the multiple documents as a monitoring tool that allows teachers to check students’ they have read, to obtain estimates about their reading skills. progress, based on the resources they have retrieved and their We treat the problem of predicting the current readability provided feedback in terms of complexity. We believe that level of users as a constraint satisfaction problem, where each YUM can not only facilitate learning when children use it for feedback provided by a student generates a constraint that their information discovery assignments, but it can also help needs to be satisfied by the readability of the student. For teachers within the classroom environment by addressing the example, a student s giving a feedback of “Too complex” to a challenge of seamlessly integrating technology to perform document of readability level 5 would generate the constraint everyday classroom activities [3, 13]. rs < 5, stating that the readability rs of s should be lower than 5. As showed in Equation 1, the predicted readability for s is the one that maximizes the amount of constraints 4. INITIAL STUDY satisfied. YUM is more than a search engine for children. Instead, it is an enhanced web environment that incorporates features ( oriented towards facilitating and fostering learning as a result X f (ci ) if r satisf ies ci of conducting successful information seeking tasks online. In rs = arg max (1) r ci ∈C 0 otherwise this initial assessment we expand on the analysis framework presented in [8] to demonstrate the need of environments where r ∈ R = {0, 0.5, . . . , 8.5, 9} represents every possible such as YUM. For doing so, we examine a number of popular readability value for the student and C is the set of constraints search engines oriented to children3 as well as Google, given created based on the feedback provided on retrieved resources that children tend to prefer it over others [1]. by s. According to reports in [2] users’ reading proficiency Due to the lack of benchmarks available for evaluating needs to be estimated based on both current and past search- search-related tools focused on young users, we collected ing process. Thus, Equation 1 considers the time stamps of our own sample of queries written by children. This sample the created constraints, favoring those created more recently includes 300 unique queries written by 50 children between and discarding the ones created outside current academic the ages of 6 and 15. For creating it, we asked various K-9 year. For doing so, f (ci ) is a function that starts at value 9 for teachers in the Idaho (USA) area to propose their students a new constraint ci and decreases by 1 for each month since an information discovery task for which the students had to the corresponding feedback was provided until 0. We selected create queries. The domain of the task was open, however, 9 as the number of months to consider as this represents the most of the children looked for information about films and average length of an academic year. Initially, YUM defines animals, generating queries such as “When is finding Dory two base constraints that represent one grade of deviation coming out?” and “How many cheetahs are in the world?”. from the current readability of the student: rs < ps + 0.5 We submitted these queries to each of the aforementioned and rs > ps − 0.5 where ps represents the prior readability search engines and examined their respective retrieved re- of student s based either on the grade level selected the first sources as well as the challenges children need to overcome time YUM is used or the rs value for the previous academic when using these engines. We discuss below details pertain- year. These constraints give YUM a starting level, that will ing to each of the aspects considered for our assessment and be adjusted as the student uses the environment. present an overview of our initial findings in Table 1. 3.1 YUM for teachers Difficulty to retrieve adequate resources. Children are Teachers can also benefit from using YUM within the class known to struggle when composing queries, often creating environment. Work setting standards have changed from a queries that are not what search engines expect [17]. Based vertical structure, where only the top individuals of the pyra- on our assessment using children queries, we observed that mid had to think critically and the lower parts just followed for 21% of the queries, (child-oriented) search engines consid- directions, to an horizontal structure, where each individual ered in this analysis did not retrieve any result or the results is expected to collaborate with others and solve important problems using identification, searching, synthesizing, and 3 Kiddle.co, KidRex.org, SafeSearchKids.com and Gogoolin- communication skills [15]. Given this change, education plans gans.com YUM Google Kiddle KidRex Safe Search Kids Gogooligans Difficulty to retrieve 42% 12% 17% 21% 21% 21% adequate resources (Cannot handle questions) Chosen Average readability(Flesh) 12.4 12.8 10.6 15.6 11.6 by the user Non- Ads Non- Ads Non adequate contents None None filtered ads related to submitted query filtered ads filtered for children Poor Mobile friendly Yes Yes Yes No No adaptation Query Yes, Yes No No No Yes, based on dictionary suggestions but for general audience Table 1: Comparison of search environments that were retrieved did not correspond with what the child auto completion. For example, when “Sven” (the name of a would expect, opposed to the 12% for which YUM was in character from the Disney movie Frozen) was typed, “Seven” same situation. As an example, the query “lollipop” retrieved (a movie not rated for children) was given as a suggestion resources about the Android Operating System rather than in most of the search engines, which doesn’t capture the resources about candies or songs, which is what a child would intended meaning of the query considering that it was writ- expect. ten by a child. YUM meets the three criteria described, by excluding ads, being adaptable to smaller screens and Readability. The readability level of resources retrieved in supporting children to improve their queries by providing response to a child query is also a relevant aspect to explore suggestions or using the most likely search intent. to quantify the success of a search from a reading for learning perspective. We computed the average readability level of the top-N results retrieved in response to children queries. Given that “children are known to systematically go through retrieved resources and rarely judge retrieved information 5. CONCLUSIONS sources” [17] we computed the readability scores reported in In this paper we presented YUM, an online environment Table 1 based on the top-3 documents retrieved in response that addresses issues children face when using popular search to each query. For measuring the readability level of the engines to conduct information seeking tasks. YUM can facil- retrieved resources, we selected the Flesch formula [7], as itate the learning that can occur while reading resources that it is considered an standard nationwide. Recall that YUM are retrieved as a result of a child-initiated search. As part of filters our retrieved resources that do have a complexity level our ongoing research efforts, we leverage the use of popular within +/- 0.5 deviation from the reading level of each user, search engines, search intent and query suggestion modules assuring that retrieved resources can be comprehended by its we have developed, a readability-based filtering strategy and users. Therefore, we only computed the average readability a novel tracking strategy, to enhance the search-for-learning levels of resources retrieved in response to queries posted on tasks conducted online and informing teachers of the progress (child-oriented) search engines considered in this analysis. As of their students, in terms of reading and comprehension. shown in Table 1, the readability levels of retrieved resources We conducted an initial assessment using queries written by are on average above 10, and even one of the search engines K-9 children and demonstrated the need for environments (SafeSearchKids) retrieved resources that average 15.6 in such as YUM. We plan to extend YUM by implementing terms of readability levels. a number of enhancements. We are aware that the Flesch formula currently used in YUM may not be precise enough. General experience. The quality of a search engine is Therefore, we will build our own readability assessment tool, not only determined by its retrieved results, the general which will go beyond counting terms and syllables, and in- search environment is also important [8]. We observed that stead will consider web-page specific metadata as well as the presence of ads was recurrent among the search engines in-depth language information, such as syntax and seman- considered in this study. These ads were usually indistin- tics. An exploration of different filtering strategies will also guishable from relevant retrieved resources, which can be be conducted based on web page authority and the level of confusing, and more importantly, sometimes not filtered for maturity of the content retrieved, so that retrieved resources children, advertising products unsuitable for children. For are more suitable to children. We also plan to explore and example, we found ads that referred to drug rehabilitation incorporate new ways of collaborative searching between stu- programs or anti-aging products among results retrieved dents and teachers, which could further enhance the learning by SafeSearchKids in response to queries such as “frozen while searching tasks. We are also aware that children may characters”. We also noticed that platform adaptability was not provide explicit feedback for all the resources they read. an issue for some of the search engines, since they showed Therefore, we also plan to explore ways of obtaining feed- poor support for small screens, such as the ones from phones back in a implicit ways, such as analyzing the time spent or tablets, making it hard for a child to use the same sys- reading the resources. Finally, a more in-depth study will tem in all platforms. This supposes a significant drawback, be conducted to better understand, quantify, and showcase given that 71% of children frequently access the internet the correlation between learning and information discovery through a tablet [16]. Finally, most of the search engines tasks conducted using enhanced web search environments. showed no or poor support for helping children improve their Since the developmental stages and information needs of K-9 queries. Google and Gogooligans suggested query reformu- children are broad, we will conduct these studies based on lations while typing. However, these suggestions were not more specific age ranges, such as 6-8 and 9-12. tailored to children or did not go beyond dictionary based 6. REFERENCES D. Heistad. Exploring gains in reading and [1] D. Bilal and M. Boehm. Towards new methodologies mathematics achievement among regular and for assessing relevance of information retrieval from exceptional students using growth curve modeling. web search engines on children’s queries. Qualitative Learning and Individual Differences, 23:92–100, 2013. and Quantitative Methods in Libraries, 1:93–100, 2013. [20] A. Usta, I. S. Altingovde, I. B. Vidinli, R. Ozcan, and [2] K. Collins-Thompson, P. N. Bennett, R. W. White, Ö. Ulusoy. How k-12 students search for learning?: S. de la Chica, and D. Sontag. Personalizing web search analysis of an educational search engine log. In ACM results by reading level. In CIKM, pages 403–412, 2011. SIGIR, pages 1151–1154, 2014. [3] S. Danby. Going online: young children and teachers [21] H. Wang, X. He, M.-W. Chang, Y. Song, R. W. White, accessing knowledge through web interactions. and W. Chu. Personalized ranking model adaptation Educating Young Children: Learning and Teaching in for web search. In ACM SIGIR, pages 323–332, 2013. the Early Childhood Years, 19(3):30, 2013. [4] N. Dragovic, I. Madrazo, and M. S. Pera. “Is sven seven?”: A search intent module for children. In ACM SIGIR, 2016. [5] N. Dragovic, I. Madrazo, and M. S. Pera. A multi-criteria strategy to recommend queries for children. In Under review, 2016. [6] C. Eickhoff, K. Collins-Thompson, P. N. Bennett, and S. Dumais. Personalizing atypical web search sessions. In ACM WSDM, pages 285– 294, 2013. [7] R. Flesch. A new readability yardstick. Journal of Applied Psychology, 32(3):221, 1948. [8] T. Gossen, J. Hempel, and A. Nürnberger. Find it if you can: usability case study of search engines for young users. Personal and Ubiquitous Computing, 17(8):1593–1603, 2013. [9] T. Gossen, M. Kotzyba, and A. Nürnberger. Knowledge journey exhibit: Towards age-adaptive search user interfaces. In Advances in Information Retrieval, pages 781–784. Springer, 2015. [10] H. C. Huurdeman and J. Kamps. Supporting the process: Adapting search systems to search stages. In Information Literacy: Moving Toward Sustainability, pages 394– 404. Springer, 2015. [11] A. Jatowt, K. Akamatsu, N. Pattanasri, and K. Tanaka. Towards more readable web: measuring readability of web pages based on link structure. ACM SIGWEB Newsletter, (Winter):4, 2012. [12] S. Knight. Finding knowledge–what is it to ’know’ when we search? 2014 http://goo.gl/LQEhXc. [13] S. Knight and N. Mercer. The role of exploratory talk in classroom search engine tasks. Technology, Pedagogy and Education, 24(3):303–319, 2015. [14] C. Lennon and H. Burdick. The lexile framework as an approach for reading measurement and success. Electronic publication on https://goo.gl/WiPlsj, 2004. [15] D. J. Leu, E. Forzani, C. Burlingame, J. Kulikowich, N. Sedransk, J. Coiro, and C. Kennedy. Assessing and preparing students for the 21st century with common core state standards. http://goo.gl/wdbTB6, pages 219–236, 2013. [16] Ofcom. Children and parents: Media use and attitudes report. 2014 http://goo.gl/g6x9ph. [17] S. Y. Rieh, K. Collins-Thompson, P. Hansen, and H.-J. Lee. Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science, 42(1):19–34, 2016. [18] A. Sadaf, T. J. Newby, and P. A. Ertmer. Exploring pre-service teachers’ beliefs about using web 2.0 technologies in k-12 classroom. Computers & Education, 59(3):937–945, 2012. [19] T. Shin, M. L. Davison, J. D. Long, C.-K. Chan, and