Towards Proactive Information Retrieval in Noisy Text with Wikipedia Concepts Tabish Ahmed, Sahan Bulathwela Centre for Artificial Intelligence, University College London, UK. Abstract Extracting useful information from the user history to clearly understand informational needs is a crucial feature of a proactive information retrieval system. Regarding understanding information and relevance, Wikipedia can provide the background knowledge that an intelligent system needs. This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text. We formulate two models that use entity linking to associate Wikipedia topics with the relevance model. Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts while a ranking model can improve precision by incorporating them. We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval. 1. Introduction The informational needs of people are highly contextual and can depend on many different factors such as their current knowledge state, interests and goals [1, 2, 3]. However, an effective information retrieval companion should minimise the human effort required in i) expressing a human information need and ii) navigating a lengthy result set. Using topical representations of the user history (e.g. [4]) can immensely help formulating zero shot queries and refining short user queries that enable proactive information retrieval (IR). While the world has digital textual information in abundance, it can often be noisy (e.g. extracted through Automatic Speech Recognition (ASR), PDF text extraction etc.), leading to state-of-the-art neural models being highly sensitive to the noise producing sub-optimal results [5]. This demands denoising steps to refine both query and document representation. In this paper, we argue that Wikipedia, an openly available encyclopedia, can be a humanly intuitive knowledge base [6] that has the potential to provide the world view many noisy information Retrieval systems need. In the midst of noisy text, we point out that it can help refine user queries (both explicit and implicit) while compiling large result sets into structured narratives that human users can navigate with optimal effort. We use a dataset with noisy text where the query description is used as a proxy of the user’s context used in a proactive infor- mation retrieval system. We demonstrate that positive results can be obtained in comparison to relevant baselines when ranking documents. PASIR’22: First Workshop on Proactive and Agent-Supported Information Retrieval at CIKM 2022, October 21, 2022, Atlanta, GA Envelope-Open tabish.ahmed.21@ucl.ac.uk (T. Ahmed) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 2. Related Work Retrieving relevant documents to a personal information need involves mining textual features that can be used to match them to user queries. While previous works have explored Wikipedia- based concept modelling for query expansion [7, 8] exclusively, our work deviates from these works as we focus on i) simultaneously adding Wiki features to the documents, ii) ranking documents with noisy texts and iii) using the query description (proxy for context extracted from user history) for query disambiguation. Latest neural ranking models dominate the state- of-the-art in IR albeit neural models struggle in the presence of noise in text [5] making much simpler, computationally efficient IR models such as BM25, DPH and PL2 much more suitable for noisy data [9]. Prior evidence has demonstrated the success of simpler models such as BM25 with the exception of failure on long documents [10]. However, more sophisticated probabilistic models such as DPH [11], motivated by the Divergence from Randomness (DFR) framework based on a hypergeometric distribution and Popper normalisation instead of Laplace normalization, have outperformed the BM25 model when it comes to noisy text [9]. This emerging need for retrieving from noisy-text has led to recent competitions such as the podcast segment retrieval challenge [12] that has paved way to improving the state-of-the-art. This competition is relevant to our work as we also tackle proactive IR in noisy text. While the best performing model within the competition uses a neural approach, it uses token embeddings rather than document embeddings [13]. The leading model uses a re-ranking model trained on an orthogonal dataset adding to computational costs significantly. The next best, Dublin City University model (DCU) uses a variety of features ranging from text, WordNet based synonyms and entities for expanding the query [14]. As the DCU model uses automatically extracted entities (via Named Entity Recognition) to expand the query, we consider this work as the most relevant model to our proposal. But, we also annotate the segments with Wikipedia concepts/entities further differentiating our work from theirs. User queries, inherently are short, averaging two to three key words in general [15]. Exploiting auxiliary information available to recover the context of the search is essential in such scenarios. In recent competitions such as the podcast segment retrieval challenge, the queries used were accompanied by a small description that provided the contextual information about the query itself. It was evidenced that the leading models in this challenge had to use the description to extract features and sometimes use the identified keywords to make queries to external search engines to further expand the queries [9]. However, in a real world system, such information is not available. Hence a proactive IR system has to rely on the user interactions from the past to build the context [16]. In this work, we use the query context as a proxy representation extracted by a proactive IR system. Proactive information retrieval is a field in IR that gained a lot of attention recently. Supporting user information queries with minimal/zero effort will enable users to use IR systems more effectively. However, a key part of facilitating information retrieval tasks proactively is building systems that can capture the relevant signal from the user that indicate the informational needs of them. Prior works in this domain have identified several ways to harvest user intent. The simplest approach in this direction is to explicitly ask questions [17] or use the demographic information of the user [18] to profile them. Rather than explicitly disrupting the user experience, many works harvest implicit user actions such as clicks, impressions and previous searches [19, 18]. Especially in the social media domain, the interactions of users are used to create concept-based models [20] that can be used to infer user preferences and create zero effort queries for new content. The extracted preference features are usually used to expand information queries [8, 7], synthesise zero-effort queries [20] or personalise search results [21]. In this work, we argue in support of using concept-based user models as a form of zero-effort query for informational recommendation and learning scenarios, especially in the presence of noisy document representations. 2.1. Concept-based User Modelling and Wikipedia In the field of document retrieval and personalisation, keyword extraction is a heavily used form of identifying concepts from textual documents. When working with short text such as social media posts, the common method is to use the words in the posts as words and build a user profile over these words [22]. While this approach provides fine grained features, the feature space can be too vast and pose challenges in increasing recall. To address this issue, other systems use topic detection using unsupervised learning approaches such as LDA [23]. Such unsupervised techniques are complex to tune and are not guaranteed to give humanly intuitive topics. While expert annotation (e.g. in education [24]) has been one of the approaches to obtain humanly intuitive, representative topics from textual content, this approach is not scalable. Wikification, a more recent approach, looks promising towards automatically extracting explainable concepts from text. Wikification identifies Wikipedia concepts present in a document by connecting natural text to Wikipedia articles via entity linking [25]. In this work, we argue for the usefulness of Wikification in creating precise, humanly intuitive concept representations at scale. Our experiments also demonstrate how Wikipedia concepts can be used to i) disambiguate query keyword and ii) at the absence of them, create entity features that can find relevant documents for users proactively. 3. Wikipedia to Support Proactive Search-based Systems Through this work, we aim to explore how the knowledge contained in Wikipedia can be utilised to better a proactive search-based system. In order to tread towards this direction, we aim to answer multiple research questions in this work as a promising step. We hypothesise that associating Wikipedia concepts to a natural language based IR system can improve a proactive IR system by i) bettering disambiguation of the meaning of user queries, ii) by exploiting the semantic meaning of document words while being less sensitive to noisy words in the document transcripts and iii) by creating opportunities for the system to understand the learner interactions more meaningfully. RQ1 Do Wikipedia concepts carry a signal that indicates relevance of documents to queries? RQ2 Can Wikipedia annotations improve noisy information retrieval? Through this work, we conduct a few preliminary experiments to validate these research questions. We choose the segment retrieval task from the Podcast Track at TREC 2020 [12] for these experiments as ”search” is a key component of proactive information retrieval. We use this task specifically as the query description is provided as part of the query, providing the context of the query. In many proactive IR systems, the query context is mined using the user interaction history. In this case, the description can be treated as a context representation mined from the users’ prior interactions. 3.1. Information Retrieval Models We restrict our model choices to a probabilistic retrieval method BM25 and a hypergeometric weighing model DPH. The reasons for restricting to probabilistic models are i) its superior performance in noisy IR [9], ii) the simplicity to introducing Wikipedia concept features, iii) the data and computational efficiency in training models and iv) the practical feasibility to real world systems. The models we utilise in this these experiments use the query 𝑞, query description 𝑑 and the segment 𝑠. 3.1.1. Baseline Models We use BM25 and DPH models as probabilistic baseline models for this work. These two models use the textual tokens of the query 𝑞 and the segment 𝑠 to compute the relevance score 𝑟𝑒𝑙 as per equation 1 where txt identifies the text token features.. 𝑟𝑒𝑙(𝑞, 𝑠) = 𝑓 (𝑞txt , 𝑠txt ) (1) As described in section 2, we implement and test the DCU run 2 model [14] (referred to as DCU model hereafter) that expands the query with entities extracted from the description as per equation 2 that expands equation 1. The entities are extracted using Named Entity Recognition (NER). 𝑟𝑒𝑙(𝑞, 𝑑, 𝑠) = 𝑓 (𝑞txt +𝑑ent , 𝑠txt ) (2) where ent represents the entities extracted from the query description 𝑑. 3.2. Proposed Models The proposed models in this work uses Wikipedia concepts to enrich the representations used in the IR model. We hypothesise that adding Wikipedia concepts can improve precision of the model as exact entities can be matched between the query + context and the segments. We propose two models that use Wikipedia topics both in query and segment expansion. Wiki_rel , This model expands the query and the document by adding the Wikipedia concepts extracted using both the query 𝑞 and the query description 𝑑 as per equation 4. This model replaces the entities in the DCU model by Wikipedia concepts. 𝑟𝑒𝑙(𝑞, 𝑑, 𝑠) = 𝑓 (𝑞txt + 𝑞wiki + 𝑑wiki , 𝑠txt + 𝑠wiki ) (3) where ⋅wiki represents the concepts extracted from the query 𝑞, description 𝑑 and segment 𝑠 . Ent_Wiki_rel , This model expands the DCU model by enriching the query, description and the segment with Wikipedia concepts. This formulation is presented in equation 4. 𝑟𝑒𝑙(𝑞, 𝑑, 𝑠) = 𝑓 (𝑞txt +𝑑ent + 𝑞wiki + 𝑑wiki , 𝑠txt + 𝑠wiki ) (4) The two proposed models help us validate RQ2 by i) Wiki_rel model validating if replacing named-entities with Wikipedia features improves the ranking models and ii) Wiki_Ent_rel model validating if adding Wikipedia features improves the model. 3.3. Data As per section 3, we use the Podcast segment retrieval task to demonstrate usefulness of Wikipedia features in proactive information retrieval. The Spotify podcast Dataset contains approx 100,000 transcripts from around 60,000 hours of audio data culminating in the largest corpus of transcribed speech data. The episodes in the dataset were randomly sampled from 105,360 English podcast episodes published between January 1, 2019 to March 1, 2020 on Spotify with 10% of the podcasts from professional creators with high production values and the other 90% coming from amateurs [26]. The data is also provided with training and testing topics (queries) along with human relevance judgements. There are 8 topics in the training dataset and another 50 topics in the test set. To keep the computational costs low, we run the experiments outlined in section 3.4 exclu- sively using the training data in the podcast dataset. We first transform the dataset into two minute overlapping segments as described in [9]. Then, we take all the relevant segments as positive documents for the different queries. As there is a substantial number of segments that are irrelevant (as the whole dataset contains 3.5 Mn segments), we down sample this data and get a negative segment set of ≈ 14000 segments that are not relevant to any of the labelled queries. This set of positive and negative segments make up the smaller scale dataset that we use in our empirical experiments. We refer to this dataset as Podcast Small dataset. We use the Wikifier [25] to associate Wikipedia concepts to both the queries (query + description) and the podcast segments as the models in section 3.1 require. 3.4. Experiments The empirical experiments we run in this work aim to answer RQ 1 and 2. We use the Podcast Small dataset for these experiments. In a proactive information retrieval setting, the user queries are inferred from users’ historical interactions. We treat the query description provided in the dataset in the place of the features extracted from the user history. We hypothesise that the query description, can be used in two ways, i) as the user context that can be used in a zero effort query to make up the context and rank relevant documents from the corpus and/or ii) as a mean to disambiguate the true meaning of a short query provided by the user without prompting the user to provide further clarifications. To answer RQ1, we aim to see if there is a significant alignment between the Wikipedia concepts present in the query and a relevant document in contrast to a non-relevant document. To measure this quantitatively, we measure the Jaccard similarity coefficient 𝜌 between the query and the document using the sets of Wikipedia concepts. Then, for each query, we take the sets of values 𝜌rel and 𝜌non rel and compare the difference of medians using a one-tailed Mann–Whitney U test (𝐻𝑎𝑙𝑡. ∶ 𝜌rel > 𝜌non rel ). To answer RQ2, we take the annotated Wikipedia concepts of the query and the podcast segments as additional features. Then we develop a new set of information retrieval algorithms outlined in section 3.1 to account for the alignment of Wikipedia concepts present. Then we use the same Podcast Small dataset to evaluate if the ranking of relevant documents is different between the baselines and the proposed models. To validate how Wikifying the context can be used to disambiguate a query, we ran Wikification on each query using i) the query words exclusively and ii) the query words + the description words, to observe if more precise identification of Wikipedia concepts is carried out. In order to run the evaluation experiments needed for RQ2, we utilise Python Terrier library [27]. The text processing and NER enrichment needed for the DCU model is done using the Spacy library. The Wikipedia annotations are sourced using the Wikifier service [25]. 3.4.1. Evaluation To evaluate if the Jaccard similarity coefficients in relation to RQ1 are significantly different, we use the test statistic of the hypothesis test. We present the performance of the ranking models related to validating RQ2 using NDCG, NDCG@30 and Precision@10 as these are the same evaluation metrics used in prior work [9]. NDCG uses graded relevance instead of binary relevance/non-relevance to focus on highly relevant documents, implying that retrieving a highly relevant document is much more important than retrieving a less important document. Whereas precision measures the number of relevant documents retrieved over the total relevant documents. Similar to the prior work, we calculate NDCG for the entire set of relevant segments available and up-till rank 30. Precision is computed by using a cut-off of top-10 ranked segments. 4. Experimental Results The results of the experiments are reported in this section. Figure 1 shows the difference of medians of Jaccard similarity coefficients, while Table 1 summarises the results obtained from the hypothesis test conducted to verify RQ1. Table 2 outlines the ranking performance obtained in the experiments run in order to verify RQ2. 4.1. Query Keyword Disambiguation using the Context We hyptothesise that the context gathered by the user can also be used to disambiguate the meaning of keywords in a search query. To test this, we take 7 queries from the training dataset that belongs to categories topical and known item (”story about riding a bird” is ignored as it is a query that belongs to the refinding category). We enrich these queries using Wikification in two settings, i) query only vs. ii) query + description, to investigate if the identified Wiki concepts are different. Table 3 reports how the salient entities in the query keywords are detected differently in the two settings. Figure 1: Jaccard similarity distribution of relevant vs. non-relevant segments for each query in the podcast dataset. Table 1 The difference of medians test for each query reported with the statistical significance. Number of Documents Median Jaccard Similarity MannWhitney U test Query ID Relevant Non-Relevant Relevant Non-Relevant Difference Test Statistic p value 1 70 14109 0.027 0.007 0.020 25680 1.03E-49 2 63 14116 0.023 0.011 0.012 47466 8.84E-46 3 78 14101 0.043 0.023 0.019 264688 1.22E-16 4 78 14101 0.034 0.027 0.007 393033 1.41E-06 5 80 14099 0.039 0.029 0.010 455015 1.42E-03 6 37 14142 0.027 0.010 0.017 35748 8.37E-48 7 77 14102 0.016 0.011 0.005 56322 2.80E-44 8 80 14099 0.032 0.023 0.010 271961 6.31E-16 5. Discussion The results obtained in the preliminary experiments conducted are very promising. Majority of the results suggest that Wikipedia concepts can help the IR system. 5.1. The Usefulness of Wikipedia Features (RQ1) Figure 1 clearly shows that the Jaccard similarity coefficient between the relevant segments 𝜌rel and the non-relevant documents 𝜌non rel are different from each other. It is even evident that the confidence intervals don’t even overlap in some of the queries in the training data. The test statistics presented in table 1 further confirms this observation. In a one-tailed hypothesis test, all the queries demonstrate statistically significant differences of medians where the median Jaccard similarity of the Wikipedia concepts of the relevant segments and the query is greater than that of the non-relevant segments. These results enable us to conclude that the the overlap of Wikipedia concepts between queries and relevant documents is an influential signal that can be incorporated in relevance computation. The results indicate that there is promise in using Table 2 Ranking performance of the models. The best performance in bold and the second best in italic face. Features Metrics Model Query NER Wiki NDCG NDCG at 30 Precision at 10 Text Entities Concepts Baselines DPH × o o 0.48 0.30 0.32 BM25 × o o 0.48 0.28 0.31 DCU × × o 0.51 0.32 0.30 New Proposals Wiki_rel × o × 0.49 0.29 0.31 Ent_Wiki_rel × × × 0.51 0.30 0.36 Table 3 The Quality of Query Word Disambiguation/Refinement when using the query exclusively vs. when using the description. Query Query Only Query + Description coronavirus spread wiki/Coronavirus wiki/Novel_coronavirus greta thunberg cross atlantic - wiki/Greta_Thunberg black hole image wiki/Black_hole wiki/Black_hole daniel ek interview - wiki/Daniel_Ek michelle obama becoming - wiki/Michelle_Obama wiki/Becoming_(philosophy) wiki/Becoming_(book) anna delvey wiki/Indian_anna wiki/Anna_Sorokin facebook stock prediction wiki/Facebook wiki/Facebook wiki/Stock wiki/Stock Wikipedia concepts as a feature in IR systems where noise is present. 5.2. Utility of Wikification in Information Retrieval (RQ2) The usefulness of Wikipedia concepts in relevance prediction is validated in the results presented in table 2. The results in the table show that replacing the NER-based entities with Wikipedia concepts is not as fruitful as augmenting them while the entities are retained. It is observed that both the proposed models outperform the entity based DCU model in precision at 10. This aligns with the our understanding of the effect of Wikipedia concepts. NER, while detecting specific persons, organisations and other entities in the text, only adds the textual representations of these entities to the query to expand it. Therefore, the newly added queries are not capable of entity disambiguation (when the same term occurs in a segment meaning something else. e.g. Apple, the fruit vs. the company) and coreference resolution (when a different textual form is used to mean the same entity. e.g. United Nations vs. UN). On the contrary, enriching both the query and the document with Wikipedia entities enable accounting for both of these effects, making the relevance calculation more precise. The Ent_Wiki_rel model that contains both the NER-based entities and Wikipedia concepts, manages to improve the precision at 10 while retaining the NDCG performance. This shows that keeping the entities in the query is Figure 2: (i) X5Learn [28] and IFacetSum [29] showing that using Wikipedia concepts to represent documents can support proactive information retrieval by summarising the key concepts/entities in the result collection, helping the user to make more informed decisions on their next steps. important to retain the NDCG gains found in the DCU model. In this scenario, we use the Wikipedia concepts and entities extracted from the description to improve the search ranking. While we treat the description in this scenario as a representation extracted from the user history, many concept-based user models described in section 2 support extracting features such as keywords [21] and Wikipedia concepts [1] from content. Query Keyword Disambiguation The results in table 3 also give us insights into the potential of using query context for keyword disambiguation. It is seen that in multiple queries, certain concepts were not even detected when the query was used exclusively for Wikificaiton (e.g. Greta Thunberg, Michelle Obama etc.). However, with the presence of context (that can be extracted from historic encounter of concepts of a user [4]), the algorithm managed to identify these concepts in the query increasing recall. Table 3 also shows instances where the precision of identification is improved in the presence of the context. The more precise Novel Coronavirus was detected in the first query while the system understood that the query was about Michelle Obama’s book, Becoming in the fifth query, not the philosophical concept. These results show how many scenarios where short query keywords can be clarified automatically by referring to the context rather than needlessly prompting the human user. 5.3. Wikipedia Concepts and the User Experience The previous section discussed one opportunity of Wikification, automatic query word disam- biguation. Such features will substantially decrease the number of user prompts in a system leading to a smoother experience. Wikipedia concepts also bring the advantage of being grounded in a taxonomy that is intuitive to humans. The human intuitive nature of Wikipedia concepts allows the system to provide more transparent result sets that the user can navigate more efficiently. Figure 2 shows two recent systems demonstrating the potential of using Wikipedia based representations to improve human-in-the-loop information retrieval. The left system, X5Learn [28] shows how different Wikipedia concepts are related to different segments of videos. The Jaccard-based relevance can be used to visualise the degree of concept overlap between the segments and the query indicating relevance (connecting to the hypothesis of RQ1). Also, using Wikipedia concepts rather than free-form keywords or NER-based entities allows the system to provide explanations of the concepts users encounter. This can be done using the knowledge base behind the taxonomy (in this case, Wikipedia as per figure 2 (left)) Many users seeking information do not fully understand the knowledge landscape they navigate and can use learning opportunities to get more familiar with the topics. IFacetSum, a multi-faceted mix-initiative interface shown in figure 2(right) is an example of how the result sets can be summarised using the entities and concepts in the document. As shown in the demonstration, the detected entities and concepts allow the user to understand the contents of the relevant documents quickly. Their user study also found that the users didn’t utilise the statements and summaries as much as they used the concepts and entities [29]. However, this system lacks the capabilities of explaining the topics as it is not backed by a knowledge base pointing to the advantages of using a taxonomy that has auxiliary information that can be used. In a nutshell, many recent visualisation approaches use concepts to represent documents. Having Wikipedia as a topic taxonomy enables these systems to make use of the advancements of Wikification that is advancing rapidly at present. It also allows taking use of other information in Wikipedia (e.g. semantic relatedness between topics, hierarchy of concepts etc.) to model the concepts in a much more sophisticated manner. This in turn, helps bridging the system seamlessly to the end user, creating cleaner interaction footprints that transform into the input for proactive search. 6. Conclusion Through this work, we explore the potential of using Wikipedia concepts to drive the proactive search experience of an IR system working on noisy documents. We explored the usefulness of Wikipedia concepts focusing on 2 research questions. We treated the query description in a podcast segment retrieval task dataset as the user context in a proactive IR system and ran a series of experiments. Our first experiment showed that the overlap of Wikipedia concepts between the relevant segments and the query are significantly larger than with the non-relevant segments. A subsequent experiment that added Wikipedia concept features to the search query and the document showed that the relevance model that uses Wikipedia concepts can improve precision of the ranking model while retaining the NDCG score in comparison to a counterpart that uses entities instead of Wikipedia concepts. A deeper analysis on the Wikification of the queries also showed that the meaning of the query keywords can be enhanced by using the query context. We also demonstrated using two recent works on how Wikipedia taxonomy can help an IR system to better connect with the end user and enable them to carry out structured IR tasks by proactively helping them navigate the information in the result sets. It is seen that having Wikipedia concepts allows presenting the result set in a humanly intuitive form that encourages the user to efficiently refine their query in the subsequent steps. All the observations from the analysed systems indicate that Wikipedia concepts have a key role to play in creating proactive information retrieval systems that can facilitate a higher degree human-in-the-loop operation. In the future work, we aim to use more sophisticated representations of Wikipedia concepts and advanced relevance models to run large scale experiments on the full podcast dataset with 3.5 Mn segments. Understanding how to use Wikipedia concepts effectively (e.g. by ranking them) is a top priority. Acknowledgments This research was partially conducted as part of the X5GON project funded from the EU’s Horizon 2020 research programme grant No 761758. This work is also supported by the European Commission-funded project “Humane AI: Toward AI Systems That Augment and Empower Humans by Understanding Us, our Society and the World Around Us” (grant 820437), EU Erasmus+ project ”European Network for Catalysing Open Resources in Education” (project ref: 621586-EPP-1-2020-1-NO-EPPKA2-KA), and the AT2030 Programme. The AT2030 programme is funded by aid from the UK government and led by the Global Disability Innovation Hub. References [1] S. Bulathwela, M. Pérez-Ortiz, E. Yilmaz, J. Shawe-Taylor, Towards an integrative educa- tional recommender for lifelong learners, in: AAAI Conference on Artificial Intelligence, AAAI 20, 2020. [2] W. Jiang, Z. A. Pardos, Q. Wei, Goal-based course recommendation, in: Proceedings of International Conference on Learning Analytics & Knowledge, 2019. [3] S. Bulathwela, M. Pérez-Ortiz, E. Yilmaz, J. Shawe-Taylor, Power to the learner: To- wards human-intuitive and integrative recommendations with open educational resources, Sustainability 14 (2022) 11682. [4] S. Bulathwela, M. Pérez-Ortiz, E. Yilmaz, J. Shawe-Taylor, Truelearn: A family of bayesian algorithms to match lifelong learners to open educational resources, in: AAAI Conference on Artificial Intelligence, AAAI 20, 2020. [5] G. Sidiropoulos, S. Vakulenko, E. Kanoulas, On the impact of speech recognition errors in passage retrieval for spoken question answering, arXiv preprint arXiv:2209.12944 (2022). [6] S. Bulathwela, M. Pérez-Ortiz, C. Halloway, J. Shawe-Taylor, Could ai democratise edu- cation? socio-technical imaginaries of an edtech revolution, in: In Proc. of the NeurIPS Workshop on Machine Learning for the Developing World (ML4D), 2021. [7] H. K. Azad, A. Deepak, A new approach for query expansion using wikipedia and wordnet, Information sciences 492 (2019) 147–163. [8] J. A. Nasir, I. Varlamis, S. Ishfaq, A knowledge-based semantic framework for query expansion, Information processing & management 56 (2019) 1605–1617. [9] R. Jones, B. Carterette, A. Clifton, M. Eskevich, G. J. Jones, J. Karlgren, A. Pappu, S. Reddy, Y. Yu, Trec 2020 podcasts track overview, arXiv preprint arXiv:2103.15953 (2021). [10] Y. Lv, C. Zhai, When documents are very long, bm25 fails!, in: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 1103–1104. [11] G. Amati, Frequentist and bayesian approach to information retrieval, in: European Conference on Information Retrieval, Springer, 2006, pp. 13–24. [12] Y. Yu, J. Karlgren, H. Bonab, A. Clifton, M. I. Tanveer, R. Jones, Spotify at the trec 2020 podcasts track: Segment retrieval, in: Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020), 2020. [13] P. Galušcáková, S. Nair, D. W. Oard, Combine and re-rank: The university of maryland at the trec 2020 podcasts track (2020). [14] Y. Moriya, G. J. Jones, Dcu-adapt at the trec 2020 podcasts track., in: TREC, 2020. [15] A. Arampatzis, J. Kamps, A study of query length, in: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 811–812. [16] P. Sen, Proactive information retrieval, Ph.D. thesis, Dublin City University, 2021. [17] G. H. Torbati, A. Yates, G. Weikum, Personalized entity search by sparse and scrutable user profiles, in: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, 2020, pp. 427–431. [18] L. Yang, Q. Guo, Y. Song, S. Meng, M. Shokouhi, K. McDonald, W. B. Croft, Modeling user interests for zero-query ranking, in: European Conference on Information Retrieval, Springer, 2016, pp. 171–184. [19] S. Bulathwela, M. Perez-Ortiz, E. Novak, E. Yilmaz, J. Shawe-Taylor, Peek: A large dataset of learner engagement with educational videos, in: Proc. of RecSys Workshop on Online Recommender Systems and User Modeling (ORSUM’21), 2021. URL: https://arxiv.org/abs/ 2109.03154. [20] F. Zarrinkalam, S. Faralli, G. Piao, E. Bagheri, Extracting, mining and predicting users’ interests from social media (2020). [21] R. Syed, K. Collins-Thompson, Retrieval algorithms optimized for human learning, in: Proc. of Int. Conf. on Research and Development in Information Retrieval (SIGIR), 2017. [22] G. Piao, J. G. Breslin, Analyzing mooc entries of professionals on linkedin for user modeling and personalized mooc recommendations, in: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP ’16, 2016. [23] F. Zarrinkalam, H. Fani, E. Bagheri, M. Kahani, Predicting users’ future interests on twitter, in: European Conference on Information Retrieval, Springer, 2017, pp. 464–476. [24] A. T. Corbett, J. R. Anderson, Knowledge tracing: Modeling the acquisition of procedural knowledge, User Modeling and User-Adapted Interaction 4 (1994). [25] J. Brank, G. Leban, M. Grobelnik, Annotating documents with relevant wikipedia concepts, in: Proc. of Slovenian KDD Conf. on Data Mining and Data Warehouses (SiKDD), 2017. [26] A. Clifton, A. Pappu, S. Reddy, Y. Yu, J. Karlgren, B. Carterette, R. Jones, The spotify podcast dataset, arXiv preprint arXiv:2004.04270 (2020). [27] C. Macdonald, N. Tonellotto, Declarative experimentation in information retrieval using PyTerrier, in: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, ACM, 2020. doi:10.1145/3409256.3409829 . [28] S. Bulathwela, S. Kreitmayer, M. Pérez-Ortiz, What’s in it for me? augmenting recom- mended learning resources with navigable annotations, in: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, IUI 20, 2020. [29] E. Hirsch, A. Eirew, O. Shapira, A. Caciularu, A. Cattan, O. Ernst, R. Pasunuru, H. Ronen, M. Bansal, I. Dagan, iFacetSum: Coreference-based interactive faceted summarization for multi-document exploration, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 283–297. URL: https: //aclanthology.org/2021.emnlp-demo.33. doi:10.18653/v1/2021.emnlp- demo.33 .