Learning when searching for web data Laura Koesten Emilia Kacprzak Jenifer Tennison The Open Data Institute The Open Data Institute The Open Data Institute London EC2A 4JE, UK London EC2A 4JE, UK London EC2A 4JE, UK University of Southampton University of Southampton jeni.tennison@theodi.org Southampton SO17 1BJ, UK Southampton SO17 1BJ, UK laura.koesten@theodi.org e.kacprzak@theodi.org ABSTRACT Searching and learning are intrinsically linked. Learn- Searching on the web increasingly involves searching for data ing has been conceptualized as the interactive intention of as well as searching for traditional web pages. To sup- searching by [10]. Learning can be the explicit aim of a port learning from data on the web and facilitating learning search - often involving several search sessions and results through searching for data, the different characteristics of that need to be interpreted and evaluated; or a byproduct these sources need to be considered. Data usually needs ad- of search rather than a specified goal [8]. This is especially ditional context in order to be transformed into information typical for exploratory search tasks where sensemaking and and subsequently knowledge of an individual. Searching for learning are inherent to the task [8]. This paper focuses data on the web requires a means to understand, analyse on how exploratory search for data can present different is- and interpret the data found. This can either be provided sues than exploratory search for web pages. Data in this by the system; by the way context is presented; or by the case refers to structured, mostly numerical or factual data, user’s prior knowledge of the topic and general data literacy available on the web to download. skills. Therefore searching for data on the web should be The remainder of this paper is structured as follows. Sec- considered an area in its own right for future research in the tion 2 introduces searching for data on the web as a distinct context of search as a learning activity. activity opposed to searching for web pages. The impor- tance of context in enabling understanding when searching for data on the web is discussed in section 3. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search pro- 2. DATA SEARCH cess; H.1.2 [User/Machine Systems]: Human informa- The majority of research about search as learning focuses tion processing on traditional web pages, which can also contain data, op- posed to looking at data on the web as an independent source Keywords [6, 12]. Data published on the web is used alongside the Web search, Data discovery, Context awareness, Sensemak- content on traditional web pages to enable decision making ing, Human information interaction about complex situations [1]. Techniques to support the task of searching for data are less advanced than those for searching for web pages. Web 1. INTRODUCTION search engines are based on algorithms which are designed Searching the web is a daily activity for people from a va- to rank web pages and do not equally support the indexing riety of backgrounds and skill sets [8] and is used for learn- of structured content [9]. Additionally, users are likely to be ing and discovery. Within this text learning is conceived as not as familiar with the process of searching for data, as dif- processing of information and as construction of knowledge, ferent skills might be needed for a successful search activity. which adheres to cognitive and constructivist approaches [1] provide a hierarchy of levels of information that has data [12]. Learning is always based on prior knowledge and is at the bottom - which is defined as raw facts; when context therefore different for people depending on their experience, is added to the data it is defined as being information; and context and abilities [7, 12, 14]. This is summarised by [7], when this information is integrated it is considered to be as a personal information infrastructure, which provides the knowledge, which means an understanding of the situation. basis for processing information in order to construct new In that sense data can be seen as the raw source, but the knowledge. construction of knowledge requires an additional process. As stated in section 1, a person’s personal information infrastructure at the time of the search task determines the ability of building relationships between information sources [1, 8]. The ability to transform data to information is de- pendent on the context provided by the system as well as on data literacy skills of the individual. A learning process might be harder to predict or evaluate for data search op- Search as Learning (SAL), July 21, 2016, Pisa, Italy posed to search for web pages. The copyright for this paper remains with its authors. Copying permitted for private and academic purposes. 3. CONTEXT interfaces are required, that offer different viewpoints to fa- Web pages often offer textual information and provide cilitate learning during the search process. therefore curated and processed data, or information, that Further research is needed to understand how people make comes with context. Furthermore search engines are very use of data resources and progress from finding to under- advanced in providing additional context - they can provide standing [8]; which can be defined as learning. The chal- contextual and personalised results by combining explicit lenges of searching for data should be an area of attention queries with implicit feedback, such as e.g. integrating the in its own right, rather than extrapolating results from tradi- user’s browsing behaviour into a ranking system [11, 13]. tional document search. The better the sensemaking process Context is a necessary source of meaning [4], and there is from data to knowledge is understood, the better systems we added complexity of context within data search due to the can create to facilitate learning from and by data search. additional information required to create meaning from data opposed to from text documents. This additional informa- 5. REFERENCES tion can partly be provided through information about the [1] M. J. Albers. Human–information interaction with data - metadata. Learning can be enhanced by providing complex information for decision-making. In reference points with the data or in the presentation of data Informatics, volume 2, pages 4–19. Multidisciplinary - to enable the user to build a web of relationships between Digital Publishing Institute, 2015. the different bits of information, which is needed to under- [2] C. Bizer, T. Heath, and T. Berners-Lee. Linked stand complex information [1]. For example, sensemaking data-the story so far. Semantic Services, of geographical data is easier when displayed in a map, and Interoperability and Web Applications: Emerging meaning can be attached to numbers if a range or a graph is Concepts, pages 205–227, 2009. presented that supports relating those numbers to reference [3] J. Calzada Prado and M. Á. Marzal. Incorporating points. data literacy into information literacy programs: Core [10] describe the sensemaking process as creating knowl- competencies and contents. Libri, 63(2):123–134, 2013. edge structures between the data or information that has [4] B. Dervin. Given a context by any other name: been acquired through the information seeking task. Deci- Methodological tools for taming the unruly beast. sions about the amount of context provided with the data Information seeking in context, 13:38, 1997. are made by data publishers or by those designing the sys- [5] S. Greenberg. Context as a dynamic construct. tem; interface design plays a key role in representing the Human-Computer Interaction, 16(2):257–268, 2001. context [5]. [6] S. Halford, C. Pope, and M. Weal. Digital futures? The presentation of data influences sensemaking [14]. In- sociological challenges and opportunities in the terfaces should enable discovery of connections between dif- emergent semantic web. Sociology, 47(1):173–189, ferent data points, that represent data in a network to make 2013. a user understand its meaning within the context of other [7] G. Marchionini. Information Seeking in Electronic data. An overview of search results can enhance orientation Environments. Cambridge University Press, New and understanding of the information provided, which can York, NY, USA, 1995. enable learning activities [10]. For data search, learning can [8] G. Marchionini. Exploratory search: from finding to be supported by allowing to zoom in and out of levels of understanding. Communications of the ACM, data, allowing filtering and cross filtering [10], rather than 49(4):41–46, 2006. displaying one piece of content at a time, such as is done [9] L. Page, S. Brin, R. Motwani, and T. Winograd. The with a list of documents. Navigational structures can sup- pagerank citation ranking: bringing order to the web. port the cognitive representation of information [10] and this 1998. is even more important when searching for data on the web, to facilitate the transition of data to information and subse- [10] S. Y. Rieh, K. Collins-Thompson, P. Hansen, and quently to knowledge. Publishing structured data as Linked H.-J. Lee. Towards searching as a learning process: A Data can be seen as a partial realisation of this idea, as it review of current perspectives and future directions. provides a basis for interlinking data by providing context Journal of Information Science, 42(1):19–34, 2016. [2], however the majority of data on the web is not published [11] A. Sieg, B. Mobasher, and R. Burke. Web search as Linked Data. personalization with ontological user profiles. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge 4. CONCLUSIONS management, pages 525–534. ACM, 2007. Current search engines are optimised for searching and [12] P. Vakkari. Searching as learning: A systematization learning factual knowledge from web pages [10], but are not based on literature. Journal of Information Science, yet fully facilitating searching for data on the web or pro- 42(1):7–18, 2016. viding the means to understand, analyse and synthesise this [13] R. W. White, P. Bailey, and L. Chen. Predicting user data. Document search differs to data search as finding, interests from contextual information. In Proceedings accessing, understanding and using data requires additional of the 32nd international ACM SIGIR conference on skills. The user’s prior knowledge and experience with the Research and development in information retrieval, domain or topic determine the ability to understand the pages 363–370. ACM, 2009. data. Skills such as accessing, interpreting and critically [14] M. L. Wilson, B. Kules, B. Shneiderman, et al. From assessing data are part of a user’s data literacy [3]. Data keyword search to exploration: Designing future usually requires additional context to be interpreted, as dis- search interfaces for the web. Foundations and Trends cussed in section 3. Hence potentially more complex search in Web Science, 2(1):1–97, 2010.