=Paper=
{{Paper
|id=Vol-2947/paper20
|storemode=property
|title=Representing, Tracking and Revising the User’s Knowledge: A Search Result Filter Framework
|pdfUrl=https://ceur-ws.org/Vol-2947/paper20.pdf
|volume=Vol-2947
|authors=Dima El Zein,Célia da Costa Pereira
|dblpUrl=https://dblp.org/rec/conf/iir/ZeinP21
}}
==Representing, Tracking and Revising the User’s Knowledge: A Search Result Filter Framework==
Representing, Tracking and Revising the User’s Knowledge: A Search Result Filter Framework Discussion Paper Dima El-Zein, Célia da-Costa-Pereira Abstract This paper presents a framework for a cognitive agent in information retrieval that personalizes the list of returned documents based on what it believes about the user’s knowledge. Throughout the interac- tions between the IR system and the user, the agent builds its beliefs about the user’s knowledge by extracting keywords from the content of the documents read by the user. The agent’s belief base, which corresponds to the user model, contains also “contextual rules” that allow deriving new beliefs about the user’s knowledge. The agent is therefore able to compare its own beliefs with the content conveyed by a to-be-proposed document, and thus understand if the document really contains useful information for the user or not. Finally, in case of beliefs’ inconsistency, the agent revises its belief base to restore consistency. Keywords Search Filter, Information Retrieval, Cognitive Agent, Knowledge Extraction, Belief Revision 1. Introduction In the domain of information retrieval, it is not always sufficient to return the information responding only to the query. It is believed that users can be considered as cognitive agents having their own beliefs and knowledge about the world [1]. They try to fulfill needs for information by requesting queries and acquire new information by examining the results. In consequence, the search results must also respond to the user beliefs, knowledge, and search goals. Considering the user’s cognitive components in the domain of Information Retrieval has been set as one of the “major challenges” by the IR community in 2018 [2]. In this paper, we propose an Information Retrieval filter framework that uses the content of the documents read by the user to learn about his/her knowledge. This cognitive awareness is employed to personalize the returned documents with respect to what the user already knows. To our knowledge, there are no research dealing with the content of the documents read by the user as his/her acquired knowledge. The framework we have proposed in [3] works as follows. For every submitted query: (𝑖) the system sends the user’s query to the search engine and receives a list of documents relevant to the query (ii) the agent examines the content of the documents in the list and measures the similarity between each document and the set of beliefs (iii) the agent returns a filtered list according to the similarity results (iv) the user reads a proposed document (v) the agent adds the IIR 2021 – 11th Italian Information Retrieval Workshop, September 13–15, 2021, Bari, Italy " elzein@i3s.unice.fr (D. El-Zein); Celia.DA-COSTA-PEREIRA@univ-cotedazur.fr (C. da-Costa-Pereira) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) keywords representing the read document as new beliefs (vi) A reasoning cycle is performed to derive new beliefs and revise the belief base if needed. 2. Proposed Framework The agent is modeled as a Rule-based agent that consists of beliefs (ground literals) and rules (Horn clauses). The beliefs represent what the agent believes about its user’s knowledge. When the agent has 𝛼 in its belief base, it believes that the user knows that 𝛼 is true. If the belief base contains ¬𝛼, then the agent believes the user knows that 𝛼 is not true. On the other side, rules are the relationships between beliefs, that are used to derive new beliefs from the agent’s existing ones. Rules have the form of 𝛼1 & 𝛼2 . . . & 𝛼𝑛 → 𝛽 where 𝛼1 , 𝛼2 , . . . , 𝛼𝑛 (𝑛 ≥ 1) 𝑎𝑛𝑑 𝛽 are literals. 𝛽 is called the derived belief, and each belief 𝛼𝑖 is a premise of the rule. The & symbol represents the logical and operator. During an agent’s reasoning cycle, if all the premises of the rule are satisfied (the premise exists in the belief base), the rule is fired and 𝛽 is added to the belief base. The rules will be considered static, their extraction/origin will not be discussed in this paper. The agent acquires its beliefs about the user’s knowledge from the documents the user has read. When the user reads a document 𝑑, the agent extracts the content of the document and considers it as an acquired knowledge by the user. We propose applying the RAKE - Rapid Automatic Keyword Extraction Algorithm - [4] as an easy and understandable method, to extract the set of scored keywords representing the document. Those keywords will be associated with the agent’s extracted beliefs. The belief base components are then both, the extracted and derived beliefs, and the rules. We consider that a belief is gradual and an agent might have beliefs more entrenched (or accepted) than others. We define a “degree” for beliefs to measure this entrenchment: Definition 1. The degree of a belief 𝛼 is the degree to which the agent believes the user is knowl- edgeable about 𝛼. It is represented by a decimal ranging between 0 and 1, where 0 means the lowest degree –the agent believes the user has absolutely no knowledge about 𝛼, and 1 means the highest degree –the agent believes the user has the maximum knowledge about 𝛼. Let us define document 𝑑 = {(𝑘1 ; 𝑠1 ) . . . , (𝑘𝑛 ; 𝑠𝑛 )} as a set of tuples where 𝑘𝑖 is the keyword extracted by RAKE and 𝑠𝑖 is its related score; 𝑘𝑖 will be associated with an extracted belief 𝑏𝑗 whose degree is calculated as follows: 𝑠𝑖 𝑑𝑒𝑔𝑟𝑒𝑒(𝑏𝑗 ) = 𝜆 · . (1) max𝑠𝑗 ∈𝑑 (𝑠𝑗 ) In Equation 1 the RAKE score of an extracted keyword is normalized then multiplied by an adjustment factor 𝜆 ∈ [0, 1] that weakens the magnitude of the degrees. The adjustment factor may vary based on different characteristics such as the trust on the document’s source, for example. This equation allows the calculation of the degree for extracted beliefs only. As for derived beliefs, their degrees will depend on the degree of premises that derived them. For that reason, we track the dependency between the beliefs by following the approach proposed by Alechina et al. [5] in which the dependency between beliefs is tracked as follows. For every fired rule instance, a Justification 𝐽 will record: (i) the derived belief and (ii) a support list, 𝑠, which contains the premises of the rules. The dependency information of a belief has the form of two lists: dependencies and justifications. A dependencies list records the justifications of a belief, and a justifications list contains all the Justifications where the belief is a member of a support. The degree value of a derived belief 𝑏, 𝑑𝑒𝑔𝑟𝑒𝑒(𝑏), is equal to that of its highest quality justification. Definition 2. 𝑑𝑒𝑔𝑟𝑒𝑒(𝑏) = max{𝑞𝑢𝑎𝑙(𝐽0 ), . . . , 𝑞𝑢𝑎𝑙(𝐽𝑛 )} (2) Definition 3. The quality of justification 𝐽, 𝑞𝑢𝑎𝑙(𝐽), is equal to the degree of the least entrenched belief in its support list. 𝑞𝑢𝑎𝑙(𝐽) = min{𝑑𝑒𝑔𝑟𝑒𝑒(𝑏) : 𝑏 ∈ support of 𝐽} (3) For example, suppose an agent has two beliefs 𝑝𝑙𝑎𝑛𝑒𝑡𝑠 and 𝑠𝑡𝑎𝑟𝑠 with degree equal to 0.5 and 0.7 respectively. The belief base has also a rule 𝑝𝑙𝑎𝑛𝑒𝑡𝑠 & 𝑠𝑡𝑎𝑟𝑠 → 𝑔𝑎𝑙𝑎𝑥𝑖𝑒𝑠. It means that if the agent “believes" in stars and planets, it will believe in galaxies. When the rule is fired, a Justification 𝐽1 denoted as (𝑔𝑎𝑙𝑎𝑥𝑖𝑒𝑠, [𝑝𝑙𝑎𝑛𝑒𝑡𝑠, 𝑠𝑡𝑎𝑟𝑠]) will be added; galax- ies is the derived belief and [planets, stars] is the support list. The quality of 𝐽1 is equal to 𝑚𝑖𝑛{𝑑𝑒𝑔𝑟𝑒𝑒(𝑝𝑙𝑎𝑛𝑒𝑡𝑠); 𝑑𝑒𝑔𝑟𝑒𝑒(𝑠𝑡𝑎𝑟𝑠)} = 0.7. 𝐽1 is in the dependencies list of 𝑔𝑎𝑙𝑎𝑥𝑖𝑒𝑠 and in the justifications list both 𝑝𝑙𝑎𝑛𝑒𝑡𝑠 and 𝑠𝑡𝑎𝑟𝑠. While the agent is acquiring more information about the user, it is adding more beliefs to its belief base. The beliefs might be new, already existing, or contradicting with the existing ones; that calls for the need of revising beliefs to ensure the belief base is consistent. Belief revision is the process of modifying the belief base to maintain its consistency whenever new information becomes available. We follow the AGM belief revision theory [6] that defines postulates a rational agent should satisfy when performing belief revision. We consider a belief base 𝐾 and a new piece of information 𝛼. 𝐾 is inconsistent, when both 𝛼 and ¬𝛼 are in 𝐶𝑛(𝐾), or 𝐶𝑛(𝐾) = ⊥, or both 𝛼 and ¬𝛼 are logical consequences of 𝐾. Three operators are considered: Expansion 𝐾 + 𝛼: adds a new belief 𝛼 that does not contradict with the existing beliefs. Contraction 𝐾 ÷ 𝛼: removes a belief 𝛼 and all other beliefs that logically imply/entail it. Revision 𝐾 * 𝛼: adds a belief 𝛼 as long as it does not cause a contradiction in 𝐾. In our framework, if the addition of a belief 𝛼 will cause inconsistencies in K (because of the existence of a ¬𝛼), the priority/preference is given to the belief with the higher degree: In case 𝛼 has the higher degree, the revision operation starts with minimal changes in 𝐾 to make it consistent with 𝛼, contracts ¬𝛼 , then adds 𝛼. If ¬𝛼 was a derived belief, we do not contract other beliefs that derived ¬𝛼, as long they are consistent with the remaining beliefs (minimal change) – coherence approach [7]. In other words, we only contract the belief in question with its related justification(s), without contracting neither the rule’s premises nor the rule itself. In case ¬𝛼 had the higher degree, then the addition of 𝛼 is discarded. The filtering process is based on the similarity 𝑆𝑖𝑚(𝐵, 𝑑) between the agent’s set of beliefs 𝐵 = {(𝑏1 ; 𝑑𝑒𝑔𝑟𝑒𝑒(𝑏1 )), . . . , (𝑏𝑚 ; 𝑑𝑒𝑔𝑟𝑒𝑒(𝑏𝑚 ))} and the content of a proposed document 𝑑 = {(𝑘1 ; 𝑠1 ), . . . , (𝑘𝑛 ; 𝑠𝑛 )} to be proposed to the user. We propose a similarity measure that considers the degrees of the intersected beliefs and the knowledge in the document. The formula is inspired by the similarity function proposed by Lau et al. in [8]. Let us consider 𝑆, the set of keywords appearing both in 𝑑 and in 𝐵 defined by 𝑆 = {𝑘𝑖 ∈ 𝑑 : 𝑒𝑥𝑡𝑒𝑛𝑡(𝐵, 𝑘𝑖 ) > 0 ∨ 𝑒𝑥𝑡𝑒𝑛𝑡(𝐵, ¬𝑘𝑖 ) > 0}. {︃ max{∑︀ 𝑘𝑖 ∈𝑑 [𝑒𝑥𝑡𝑒𝑛𝑡(𝐵,𝑘𝑖 ))−𝑒𝑥𝑡𝑒𝑛𝑡(𝐵,¬𝑘𝑖 )],0} if |𝑆| ̸= 0 𝑆𝑖𝑚(𝐵, 𝑑) = |𝑆| (4) 0 otherwise. The 𝑒𝑥𝑡𝑒𝑛𝑡(𝐵, 𝑘𝑖 ) = 𝑑𝑒𝑔𝑟𝑒𝑒(𝑘𝑖 ), if 𝑘𝑖 ∈ 𝐵; and 0 otherwise. The similarity formula “rewards” the documents containing common keywords with the set 𝐵 and penalizes those containing keywords whose corresponding negated beliefs are in 𝐵. We set a cutoff value 𝛾 for 𝑆𝑖𝑚(𝐵, 𝑑) that allows to decide whether the knowledge inside a document is similar to a set of beliefs or not. The filter is used according to the intended application: when the purpose of framework is reinforcing the user’s knowledge, then docu- ments that are “close” to the agent’s beliefs will be returned. The documents having a similarity score greater than the cutoff will be returned to the user. Contrarily, when the framework is employed for novelty, the documents having similarity below the cutoff will be returned. 3. Conclusion This paper proposed an innovative framework for a rule-based information retrieval agent which relies on its cognitive abilities to learn about the user’s knowledge. This information is used to propose new/relevant documents accordingly. The components of the framework are: (1) Rule-based module, modeling the agent’s beliefs and rules. It performs inference reasoning about the user’s knowledge, calculates the entrenchment degrees and tracks the dependency between them. It also revises the beliefs if needed, to maintain consistency. (2) Knowledge extractor module, extracting knowledge from the documents read by the user. (3) Result filtering module, compares the content of the potential to-be-proposed documents to the user’s knowledge and select the “useful” ones to be returned to the user. For future work, we aim to take into account the confidence in the sources of the documents, which will probably affect the degree of entrenchment of a belief. Another possible extension could be to integrate some semantic analysis to deal with semantically similar content. References [1] M. da Costa Móra, J. G. P. Lopes, R. M. Vicari, H. Coelho, Bdi models and systems: Bridging the gap., in: ATAL, 1998, pp. 11–27. [2] J. S. Culpepper, F. Diaz, M. D. Smucker, Research frontiers in information retrieval: Report from the third strategic workshop on information retrieval in lorne (SWIRL 2018), SIGIR Forum 52 (2018) 34–90. [3] D. El Zein, C. da Costa Pereira, A cognitive agent framework in information retrieval: Using user beliefs to customize results, in: The 23rd International Conference on Principles and Practice of Multi-Agent Systems, 2020. [4] S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic keyword extraction from individual documents, Text mining: applications and theory 1 (2010) 1–20. [5] N. Alechina, M. Jago, B. Logan, Preference-based belief revision for rule-based agents, Synthese 165 (2008) 159–177. [6] C. E. Alchourrón, P. Gärdenfors, D. Makinson, On the logic of theory change: Partial meet contraction and revision functions, The journal of symbolic logic 50 (1985) 510–530. [7] P. Gärdenfors, Belief revision: An introduction, Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, 1992, pp. 1–28. [8] R. Y. Lau, P. D. Bruza, D. Song, Belief revision for adaptive information retrieval, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004, pp. 130–137.