<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reinforcement Learning-driven Information Seeking: A Quantum Probabilistic Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amit Kum</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>iming Liu[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ingo Frommholz[</string-name>
          <email>ingo.frommholzg@beds.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bedfordshire Luton</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <fpage>16</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>Understanding an information forager's actions during interaction is very important for the study of interactive information retrieval. Although information spread in an uncertain information space is substantially complex due to the high entanglement of users interacting with information objects (text, image, etc.). However, an information forager, in general, accompanies a piece of information (information diet) while searching (or foraging) alternative contents, typically subject to decisive uncertainty. Such types of uncertainty are analogous to measurements in quantum mechanics which follow the uncertainty principle. In this paper, we discuss information seeking as a reinforcement learning task. We then present a reinforcement learning-based framework to model the foragers exploration that treats the information forager as an agent to guide their behaviour. Also, our framework incorporates the inherent uncertainty of the foragers' action using the mathematical formalism of quantum mechanics.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Seeking</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>Information Foraging</kwd>
        <kwd>Quantum Probabilities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Web searchers, in general, move from one webpage to another by following links
or cues while keeping the consumed information (intake) with itself without
attaining a generalised appetite (information diet) in possession of uncertain and
dynamic information environments [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. In general, the evolution of
information patterns from user interaction keeps searchers in an information seeking
process to not consume optimised information diet (the information goal). So,
there needs to be a mechanism that can guide the foragers during their search
process in order to set a realistic information appetite. User interaction is an
important part of the search process which can enhance the search performance
and the information foragers' search experiences and satisfaction [
        <xref ref-type="bibr" rid="ref15 ref3 ref5">3, 5, 15</xref>
        ]. User
action and their dynamics during search play an important role in changing
behaviour and user belief states [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It has been recently demonstrated that action
behaviour representations can be learned using reinforcement learning (RL) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
by extrapolating a policy in two components - action representation and its
transformation. To e ectuate the information foragers' (or searchers'/users') [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
cognitive ability during the search, we treat the searcher as an RL agent which
follows Information Foraging Theory (IFT) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], to understand how the users
can learn in an ongoing process of nding information. Furthermore, the
learning ability of the users can be signalled by the RL approach through giving a
free-choice of search scenarios in an uncertain environment. For instance, the
information seeker must optimise the trade-o between exploration by sustained
steps in the search space on the one hand and exploitation using the resources
encountered on the other hand. We believe that this trade-o characterises how
a user deals with uncertainty and its two aspects { risk and ambiguity during
the search process [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Therefore the pattern of behaviour in IFT is mostly
sequential. Risk and ambiguity minimisation cannot happen simultaneously, which
leads to an underlying limit on how good such a trade-o can be. This lets the
information foraging perspective of information seeking converge with the
developing eld of quantum theory [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Moreover, web search engines enable their
users to e ciently access a large amount of information on the Web, which in
turn leads search users to learn new knowledge and skills during their search
processes. When the users search to obtain knowledge, their information needs1
are mostly varied, open-ended, and seldom not clear at the start. Such types
of search sessions generally span multiple queries and involve rich interactions,
therefore our aim is to model such kind of information foraging process where
the users' cognitive state changes during search.
      </p>
      <p>
        Due to its inherently complex and intense interactive nature, the e ective
and interactive information foraging process is exigent for both the users and
the search systems. Hence, our focus is to incorporate contextual semantic
information in modelling the information forager with the usage of the mathematical
framework of quantum theory, i.e. quantum probabilities based on geometry.
Speci cally, we propose a quantum-inspired reinforcement learning approach
that (a) models the information foragers' behaviour, where action-selection (or
policy) is leveraged as an Actor-critic method [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to enhance the agent's
experience in a text query-matching task; (b) learns the policy where query
representation is parameterised using quantum language models, with a focus on the
interaction across multi-meaning words.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        This section covers aspects of reinforcement learning, Quantum theory in
dynamic information retrieval (IR), in particular, interactive information retrieval [
        <xref ref-type="bibr" rid="ref7 ref8">8,
7</xref>
        ], and Information Foraging theory.
      </p>
      <sec id="sec-2-1">
        <title>1 we consider an IN is expressed by a query or series of queries</title>
        <p>
          Reinforcement Learning in Information Retrieval: Humans' transfer of
information to other animals is a common method of learning and interaction, which is
generally called reinforcement learning. Reinforcement learning [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] (RL)
techniques are motivated by our sense of decision making in humans which appears to
be biologically rooted. Within such biological roots [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], when an information
foragers' action ends up with a disadvantageous consequence (or negative payo ),
such action will not be counted in the future; whereas, if his/her action leads to
a successful consequence (or positive reward), it will happen again. User
involvement in information searching is primarily a decision making (or action taking)
process [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], where users re ect identical RL features during this process. We
will adopt RL models to manifest the mechanisms prevailing users' learning of
information from searching. Previous work [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] found that a search system's
information can be enriched to advance search intention and automate the di cult
query reformulation by modelling the search context. Reinforcement learning is
an important method that can let the system employ the search context and
relevance feedback simultaneously. Also, this approach allows the system to deal
with exploration (widening the search among di erent topics) and exploitation
(moving deeper into generic subtopics) which has been supportive in information
retrieval [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ]. Exploration and exploitation methods are usually employed in
tasks associated with recommender systems or information retrieval, such as
foraging strategies [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], recommendation [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] or image retrieval [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. However,
reinforcement learning is mainly used by search/retrieval systems [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], which
collect users' interests and habits over a continuous period, while in a speci c
search scenario the users in a given search session are more interested in the
holistic improvement of the search results than relying on arbitrary future search
sessions.
        </p>
        <p>
          Quantum Theory and Information Retrieval: Quantum Theory (QT) has been
matured to reinforce the search potential by employing the mathematical
formalism of quantum mechanics to information retrieval [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. The aim of
introducing the QT formalism was to elucidate the implausible behaviour of micro-level
search actions, which classical probability theory may not be able to model.
Furthermore, it is an expressive formalism that can combine prominent
probabilistic, geometric and logic-based IR approaches. The mathematical foundation
of the Hilbert space formalism was introduced in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] to apply this mathematical
framework outside of Physics. We refer to events as a subset of a sample space
of all potential events in classical probability theory, whereas in Quantum
theory, the probabilistic space is geometrically de ned, and the representation of it
becomes an in nite set of angles and distance commonly named as an abstract
vector space | or, more appropriately, a nite or in nite-dimensional Hilbert
Space denoted by H. Each and every event is depicted as a subspace of the
Hilbert Space. To represent the n-dimensional vectors that compose a Hilbert
Space, the Dirac notation is widely adopted, using ket and bra nomenclatures.
More concretely, this means representing one given vector as j i and its
transposed view, T as h j. Also, the vectors under consideration in a Hilbert space
are usually unit vectors (their length is 1). A projection onto a subspace induced
by a vector j i is denoted by the operation resulting in a matrix j i h j. In this
subspace, the vectors contained within are again normalised2, and the projection
of events represented as vectors, again, is performed by the j i h j operation.
Unit vectors interpreted as state vectors induce a probability distribution over
events (subspaces), and the product resulting from the mentioned operation is
called density matrix. We use so-called observables to perform a measurement
of the outcomes (which are eigenvalues).
        </p>
        <p>
          The major similarity between quantum mechanics (QM) and information
retrieval (IR) is understanding the interaction between a user (the observer in
QM) and the information object under observation [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. The core connection
between QM and IR stems from the probabilistic features, where there is an
observation of agreement for the preface of conditional probabilities allied with
interference e ects dominating to some contextual measure (cognitive, subjective
character) when consolidating varied objects3. In QT, we can represent user
information needs with state vectors, and the query/observable, eigenvalues and
the probability of obtaining single eigenvalues or objects as a measure of the
degree of relevance to a query [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Earlier QM was incorporated withing the RL
algorithmic approach to generalise on ltering favourable user actions [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
Information Foraging Theory: Information Foraging theory (IFT) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was
developed to understand human cognition and their behaviours. IFT provides
stipulated constructs adopted from optimal foraging theory which includes predators
conforming to humans who seek for information (or prey). It has three
constructs, one of which delineates searches (or Search engine result page (SERP)s)
in the user interface sections, referred to as information patches ; information
scent helps users make use of perceptual cues, such as web links spanning small
snippets of graphics and text, consecutively to make their navigation decisions in
selecting a speci c link. The purpose of such cues is to characterise the contents
that will be envisaged by trailing the links. Finally, information diet allows users
to narrow or expand diversities of information sources based on their pro
tabilities (appetite).
        </p>
        <p>
          Information Foraging is an active area of IR and information seeking due to
its sound theoretical basis to explain the characteristics of user behaviour. IFT
has been applied to model users' information needs and their actions using
information scent [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. However, it has been previously found that information scent
can analyse and predict the usability of a website by determining the website's
scent [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Liu et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] demonstrated an IFT-inspired user classi cation model
for a content-based image retrieval system to understand the users' search
preference and behaviours by functioning the model on a wide range of interaction
features collected from the screen capture of di erent users' search processes.
Recent work [
          <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
          ] studied the e ects of foraging in personalised
recommendation systems by inspecting the visual attention mechanism to understand how
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2 there may be some vectors which are not necessarily normalised</title>
        <p>
          3
https://www.newscientist.com/article/mg21128285-900-quantum-minds-why-wethink-like-quarks/
users follow recommended items. Such user-item interactions can also be seen
in query-level interactions i.e., in query reformulation scenarios where IFT- and
RL-like models [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ] provide better explainability.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Information Seeking As Reinforcement Learning Task</title>
      <p>A searcher during the search process has to investigate several actions before
selecting any of it, with unknown reward. They explore each result back and forth
to estimate the optimal patch based on the reward. This scenario of information
seeking can be interpreted as a reinforcement learning task where the search
process, involving an agent to interact with the search environment, is cost-driven.
Assessing positively rewarded actions (from searcher's incurred costs) by the
agent within an uncertain environment can potentially optimise the foragers'
choice in nding the information. From an IFT perspective, positively rewarded
actions can be drawn as exploitation whereas the available actions as
exploration provided the information must be scattered between patchy environment.
The fundamental aspect of reinforcement learning is to \learn by doing with
delayed reward", which emerges as a major connection to information seeking
(especially user interaction in IR and recommendation tasks) and it also
interprets the foraging process of a searcher. The seeker's goal is to quickly locate a
relevant patch (document, image, etc). However, the information seeker has no
prior knowledge of the rewards from assessed patches and they keep exploring
each of it. The seeker interacts with the search system to explore which results in
relevant information elicits the rewards distribution (information scent patterns)
between information patches; often the access to patches with minimum reward
can signify an optimal patch that the seeker has spent less time on for
exploitation. An information seeker spending less time assessing each information patch
leads to partially-relevant information about the seeking process that elicits the
rewards distribution between the patches, and it gives rise to exploitation of a
patch with less than the optimal rewards. Hence, the longer a seeker explores,
he/she consumes near-accurate information about all of the patches but gives
up the chance to exploit the most relevant patch for long. Understanding these
operationalised scenario paves the way to model foraging behaviour in which
user causes could be uncertainty, information overload, and confusion.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Quantum-inspired Reinforcement Learning Framework</title>
      <p>We outline the proposed reinforcement learning approach to model the forager's
action during an information seeking scenario where the task is to match a query
for a given document in which the forager actions are queries. An agent interacts
with its search environment characterised by a patchy distribution of information
to nd an optimal foraging strategy to maximise its reward. The forager's
environment provides a xed setting of optional information sources. Moreover, the
forager has the choice to add a distinctive type of information patch into their
diet. However, the distribution of distinctive information patches may consist
of information which the forager could likely not consume due to
counterfactual situations in making decision amongst which patch (let us say document
D1 and D2) contains certain information. In our framework, we consider the
environment to be uncertain with dynamic parameters throughout a forager's
search trail. The forager nds it di cult to di erentiate patches and exploits
experience to learn the environment. The increasing learning makes it complex
at the dynamic and cognitive level where the forager's pursuit is to locate most
relevant documents.</p>
      <p>
        We use the Actor-critic policy gradient method [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] which inherently models
such dynamics due to the forager's sequential behaviours that generate a
continuous state representation. A forager's action (or state) can be described with
the quantum superposition state and the corresponding updated state vectors,
based on the respective interaction, can be achieved by random observation of
the simulated quantum state based on the collapse principle of quantum
measurement [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The probability of such an action state vector can be obtained
by the probability amplitude which will be updated in parallel based on reward.
This gives rise to new internal aspects in traditional RL algorithms which are
policy, representation, action (in parallel) and operation update.
      </p>
      <p>
        The quantum measurement decision process of a forager in selecting a
document (the action) while seeking is ambiguous and uncertain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In such situation,
an observable describes possible actions (documents or information patches to
select) and can be represented as (O^) with a base set containing j0i and j1i which
corresponds to the two state vectors of O^. The measurement of a quantum
system on the observable (O^) in a corresponding superposed quantum state (j i)
refers to a measurement in superposition state. When making a measurement in
state j i, the quantum state would collapse into one of its basis states j0i or j1i.
However, one cannot obtain a prior with certainty whether it will collapse to
either of these states. The only information this quantum system can provide is
j0i will be measured with probability j j2 or j j2 as the probability to measure
j1i, where and represent the respective probability amplitudes.
      </p>
      <p>
        We present a quantum-inspired reinforcement learning (qRL) framework for
information seeking under dynamic search scenarios. The schematic architecture
of qRL is shown in Fig. 1. qRL has two main components, an Actor-critic [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
based network to represent the RL agent which jointly encodes state and action
spaces, and the information space known as environment containing documents.
The Actor-critic components of an RL agent have their constructs subscribed
via the Hilbert space formalism of Quantum theory [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        Our framework is applicable to matching tasks, in particular, semantic query
matching where candidate queries (extracted/predicted queries from the
document) with the original document will be matched in a semantic Hilbert space
(SHS) [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. An SHS is a vector space of words, where words in combination
involve a linear/non-linear formation of amplitudes and phases, delineating various
level of semantics of combined words. In the SHS, a word wi is represented by a
base vector jwii. Semantics of combined words are represented by superpositions
of word vectors, encoded in the probability amplitudes of the corresponding base
vectors.
The standard reinforcement learning is based on a nite state, discrete time
Markov decision process (MDP) composed of ve components: st; at; pij (a); ri;a
and C, where st, the state at time t, delineates at the action at a speci c time
for a given state; pij (at) is the probability of state transition (from state st to
st+1 via action at for all t 2 (i; j)), r is a reward function where r : ! R
with = f(i; a) j i 2 st; a 2 atg, and C is an objective function.
      </p>
      <p>
        In the following discussion we utilise tensor spaces. The notation in Table 1
follows those in [
        <xref ref-type="bibr" rid="ref34 ref35">35, 34</xref>
        ]. The fabric of our framework, i.e. the underlying Hilbert
space H, is similar to the Tensor Space Language Model described in [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Here,
the base vectors fj iigin=1 of our n-dimensional space4 are term vectors, either
one-hot vectors or word embeddings. Any word vector jwi can be written as
linear combination of the base vectors, i.e. jwi = Pn
i=1 i j ii with i 2 R (or
C in the complex case) as coe cient.
formal representation of the user action is shown in Table 1). In our framework,
the forager (or searcher) action is to match a candidate query jqi (generated after
inputting a set of queries) from document D to delineate jqrDi, where jqrDi refers
to a query state vector that represents the most optimal query for the selected
document D given a positive/optimal reward (r). A candidate query is an outcome
generated from the Actor network given the forager set of input queries.
State: A state st delineates the positive historical interaction of the forager with the
search environment. In our framework, the Actor network has its state encoded
by the product of the probability amplitudes of global-local projection qT q
(of word meanings) for all words of a query We refer to the state representation
de ned by the product pooling method.
      </p>
      <p>State Transition: The state representation describes the positive historical
interaction of a forager. The transition among the states can be computed from the user's
feedback. Our framework uses a convolutional neural network which has its
convolution based on a state vector that encodes the historical interaction of the forager
in nding the match of a query.</p>
      <p>
        Policy: Policy is a strategic mechanism which represents the probability of a forager's
action under a certain state. Our framework's policy network is stochastic, and we
employ the Actor-Critic RL method [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] (Fig. 1) which assists the forager actions
in the Actor network with a optimal policy value generated from the Critic. Thus,
the Actor network estimates the probability of a forager action, and the Critic
network gets the optimal value and updates it. The policy network is modelled as
a probability distribution over actions and hence it is stochastic.
      </p>
      <p>
        Reward: Reward (r(s; a)) in reinforcement learning is the success value of an agent's
action a. This success value in information retrieval is interpreted in terms of the
relevance judgement score [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Our framework process to receive reward values for
the Critic network which inputs a pair of (state, action) that provides to the Actor
network as an optimal reward for a given action which judges and scores the actions
of the agent (or forager).
4.2
      </p>
      <p>Our Proposed Framework
This fundamental RL de nition is of utmost importance for proposing quantum-inspired
reinforcement learning constructs. Following the quantum probability concepts, below
are the constructs as follows (please also refer to Figure 1):</p>
      <p>
        Actor Network: An Actor-critic [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] method refers to as policy gradient
mechanism, where the Actor network for a given forager (or information seeker) in a particular
state jsti outputs an action jati. This network inputs user queries (forager's actions),
where these queries jq1i ; jq2i ; :::; jqni or a set of textual descriptions (which collectively
form a document) form the local and global representations so as to model the
interrelated interaction between words. Inspired by the notion of quantum theory, we employ
the interpretation of wave function (due to the importance of word positions [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]) j i
as a state vector that can be explicated in RL constructs.
      </p>
      <p>
        The Actor network inputs query state vectors jq1i : : : ; jqni, where each word in
a query is treated as a tensor product of vectors jwii and every word has a unique
basis vector j bi i that provides a generic semantic meaning with an associated
probability amplitude. The speciality of a basis vector is that it can lead to a di erent
meaning if interpreted severally across it. Then, we apply our framework in a
semantic query matching task by a real-valued representation of queries by means of
local and global distribution so to allow such intermittent basis vectors that
perceive the interaction between the meaning of di erent words. Hence, the wave
function description of a query jqii can be depicted using the tensor product of words as
qT = jw1i jw2i ::: jwni. A word dependency can be seen by expansion of
tensors as qT = Pbk1;:::;bn=1 Lb1:::bn j bi i ::: j bn i, where L (the value is shown in
Table 1) depicts the allied probability amplitude of kn dimensional tensor in which
it has the respective basis vectors j bi i ; :::; j bn i representing the meaning of the
corresponding query. This tensor-based query representation is a local representation as
a tensor with rank 1 actually delineates the local distribution of a query [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. For
words that are unseen in a query or compound meanings we need a global
representation of them provided by a collective set of basis states (or vectors). A state
vector (i.e., wave function of a query) to describe such a global representation is
j qi = Pbk1;:::;bn=1 Gb1:::bn j bi i ::: j bn i. This wave function delineates a semantic
embedding space of n uncertain word meanings of a given query. The local and global
representation di ers in terms of their corresponding probability amplitudes i.e., L
and G, in which the probability amplitudes of the global distribution will be trained
on a large collection of previous queries whereas the probability amplitudes of local
distribution relates only to the input query. To compute the probability amplitudes
among words from the input query (local representation) and unseen words generated
from the global representation, we perform the inner product qT q of both
representations that disentangle the interaction among it. The value of the projection is
shown in Table 1. We use a convolutional neural network (CNN) to learn the obtained
higher-dimensional tensor G (value shown in Table 1), where tensor rank
decomposition can be used to decompose it (among other methods such as generalised singular
value decomposition) and the decomposed unit vector er;n with each rank 1 tensor of
weight coe cient wr. The unit vector is k-dimensional and the set of vectors er;n acts
as a subspace of tensor G. The CNN inputs a query state vector with a convolution
lter composed of the projection (inner product) among the jqi and the decomposed
vector, which makes the CNN trainable. Then, the state representation (actor's state
in Table 1) performs the product of all mapped unit vectors (from G) for all the
subwords of a query. After all these operations, the Actor network yields an action state
vector jati (action at at time t) to depict a set of matched words.
      </p>
      <p>
        Critic Network: The Critic network of the qRL framework is based on a
quantumlike language model parameterised CNN which inputs the generated state and the
candidate action jati from the Actor network. The output of the Critic network is a
scalar value or value of the Q-function [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The reward values Re 2 [ 1; 0; 1] re ect
the ability of the candidate action generated by the Actor network. The signi cance of
the reward value represents the probability of designating the correct label to action
i.e, the multi-class classi cation of queries to match among documents will be used to
update the reward. Rewards (or classi cation labels) are categorised as -1 for a
mismatched query which has negative word polarity (leading to a compound meaning).
For instance, "dogs chase cats" and "dogs do not chase cats" contribute to a compound
meaning itself but in an opposite sense. We tend to consider that a word renders the
entire polarity of a query, provided to which new word it associates with. A realistic
example of this hypothesis can relate to one of our framework's main constructs i.e., jqi
which is a state vector equal to the tensor product of possible words, where the word
coe cients (i.e., probability amplitudes) of basis vectors can be altered to derive a new
query giving rise to a compound meaning. The negative word polarity example is an
actual scenario of it. Positive and zero rewards are classed as matching and partially
matching for queries.
      </p>
      <p>
        In the Critic part, the concatenation of the actor's state and candidate action is
performed using one-hot encoding in which the query is passed via a complex-valued
lookup table, where each word in their own superposition state is encoded into a
complex embedding vector [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Then, a measurement is performed using the square
projection to compute the query density matrix from the complex embedding vectors. The
probability of a measurement can be estimated using the Born's postulate for a given
query state (a density matrix) which is p = Tr(P ), where p, P and Tr represents the
class of the query, projection matrix, and trace of a matrix, respectively. The density
state = Pin=1 i jwii hwij of a query is perceived as the word states in combination,
provided that the density matrix (jwii hwij) re ects a word (wi) in superposition state
(in this case Pn
      </p>
      <p>
        i=1 i = 1). The generated query density matrix has its diagonal and
non-diagonal entries as real-valued and complex nonzero values, and both type of
entries inherently inform about the distribution of semantic and contingent meanings. We
adopt the interpretation of complex phase introduced in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] to compute the sentence
density matrix which has word senses as positive, neutral, and negative. The reward
is estimated using such interpretation from the measurement matrix. A pictorial
representation of the Critic network is shown in Fig. 1.
      </p>
      <p>In brief, the Actor-Critic policy network helps su ce the number of components
with respect to traditional reinforcement learning. Also, the Agent part of the
framework acts as a controller for the user in the same way Information Foraging mechanisms
possess to a searcher. IFT helps a searcher through suggesting an optimal foraging path
via information scent, and here in the framework the Critic network informs/updates
the Actor with a value (reward) for a certain action that is positively rewarded. Hence,
our framework meets foraging in certain regards (such as information seeking behaviour
assessed as foraging and inherently as RL task).</p>
      <p>
        Rewards: The forager aim is to identify the relevant match (or a perfect match)
of a query (or patch) for the clicked/selected document that can be perceived as its
reward. However, our framework's reward function is designed in a way to guide the
forager on how to perceive the document information and draw the most relevant
match (patch). Also, the reward value is discrete as it revolves around -1, 0, and +1.
The de nition of reward in reinforcement learning [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]5 resembles a certain analogy of
information scent, which is a measure of utility and results in two types of information
scent score { a scalar value and the probability distribution of scent patterns [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In
RL, the perspective of value distribution of received reward by an agent can depict
the analogous nature of information scent patterns. Hence, an explainable approach of
reinforcement learning-based rewards using the IFT-based model of information scent
can give further intuition to negative rewards. Information scent can be interpreted
as the perceived relevance of rewarded actions de ned as positive and negative scent
values. The physical meaning of positive and negative information scent scores are that
the forager accumulates rich information along the path he/she foraged to locate the
relevant information, and the unhealthy consumption of information reckon searcher
negative towards the search environment which leads them to give up the information
world (or RL environment) or task itself.
      </p>
      <p>
        Update Probability Amplitude/Policy: To update the probability amplitude
in the Actor network, the important part is to measure the actions for some certain
5 Rewards can be normalised to generate outcomes in reinforcement learning [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
states which on collapse will give rise to the occurrence probability of the norm of state
vector for the particular candidate action, which later will execute the Actor network.
The more we record the experience and learning of each action (even erroneous action),
the probability amplitude becomes more informative. We know that the action jati is
the tensor product of all possible words and to calculate one user action (i.e. jai) from
it can be possible while interacting with changes in probability amplitudes for the
combined meaning.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>
        In this paper, we propose a mathematical framework of reinforcement learning
inspired by the Hilbert space formalism in Quantum theory. The framework models the
learning process of forager actions in a semantic query matching task given the search
environment is patchy. The core of our framework is to characterise a forager with
very little or unclear information about their search pattern, unclear or evolving
information need and features. Also, no information about how a forager makes their
trail (initially the information scent is unknown and emanates as it follows via distinct
cues) choice during nding information and the amount of information they consume
in real-time interaction with the search system. Apart from this, the major trade-o
situation of exploration and exploitation in the foraging process makes the process
of understanding about the forager's search actions complex. To tackle such a
complex process of dynamic action for a state and vice-versa, we adapt the Actor-critic
reinforcement learning method as a policy network, in which the actor network is
continuously informed about the generated action from the critic network. The framework
subscribes to the quantum probability constructs to model by the representation of
forager search actions and states. Quantum theory has been earlier applied in the area
of information seeking [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but representing and measuring actions of each state is a
challenging scenario due to the continuous update of state-action in parallel, so using
the Actor-critic reinforcement learning method paves the way to in uence learning and
representation mechanisms; many complex IR problems could be interpreted
appropriately in a new way within such an inclusive framework. In the future, we intend to
evaluate this framework for certain IR tasks.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work is part of the Quantum Access and Retrieval Theory (QUARTZ) project,
which has received funding from the European Union's Horizon 2020 research and
innovation programme under the Marie Sklodowska-Curie grant agreement No 721321.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Pirolli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Card</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1995</year>
          , May).
          <article-title>Information foraging in information access environments</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          (pp.
          <fpage>51</fpage>
          -
          <lpage>58</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chowdhury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibb</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Landoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Uncertainty in information seeking and retrieval: A study in an academic environment</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>47</volume>
          (
          <issue>2</issue>
          ),
          <fpage>157</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>V. T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Fuhr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2012</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>Using eye-tracking with dynamic areas of interest for analyzing interactive information retrieval</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          (pp.
          <fpage>1165</fpage>
          -
          <lpage>1166</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pirolli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Card</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Information foraging</article-title>
          .
          <source>Psychological review</source>
          ,
          <volume>106</volume>
          (
          <issue>4</issue>
          ),
          <fpage>643</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Arguello</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2014</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>The e ect of cognitive abilities on information search for tasks of varying levels of complexity</article-title>
          .
          <source>In Proceedings of the 5th Information Interaction in Context Symposium</source>
          (pp.
          <fpage>165</fpage>
          -
          <lpage>174</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wittek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daranyi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gedeon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>I. S.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Risk and ambiguity in information seeking: Eye gaze patterns reveal contextual behavior in dealing with uncertainty</article-title>
          .
          <source>Frontiers in psychology, 7</source>
          ,
          <fpage>1790</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>G. H.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Dynamic Search{Optimizing the Game of Information Seeking</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .12425.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Piwowarski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lalmas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Van Rijsbergen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2010</year>
          ,
          <article-title>October)</article-title>
          .
          <article-title>What can quantum theory bring to information retrieval</article-title>
          .
          <source>In Proceedings of the 19th ACM international conference on Information and knowledge management</source>
          (pp.
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Charnov</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          (
          <year>1976</year>
          ).
          <article-title>Optimal foraging, the marginal value theorem</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Barto</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Reinforcement learning: An introduction</article-title>
          . MIT press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Chandak</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theocharous</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2019</year>
          , May).
          <article-title>Learning Action Representations for Reinforcement Learning</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          (pp.
          <fpage>941</fpage>
          -
          <lpage>950</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Belief dynamics in Web search</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <volume>65</volume>
          (
          <issue>11</issue>
          ),
          <fpage>2165</fpage>
          -
          <lpage>2178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirolli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pitkow</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2001</year>
          , March).
          <article-title>Using information scent to model user information needs and actions and the Web</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          (pp.
          <fpage>490</fpage>
          -
          <lpage>497</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirolli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pitkow</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2000</year>
          , April).
          <article-title>The scent of a site: A system for analyzing and predicting information scent, usage, and usability of a web site</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human Factors in Computing Systems</source>
          (pp.
          <fpage>161</fpage>
          -
          <lpage>168</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mulholland</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp; Ruger,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>2010</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>Applying information foraging theory to understand user interaction with content-based image retrieval</article-title>
          .
          <source>In Proceedings of the third symposium on Information interaction in context</source>
          (pp.
          <fpage>135</fpage>
          -
          <lpage>144</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>E ects of Foraging in Personalized Content-based Image Recommendation</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .00483.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2019</year>
          , December).
          <article-title>Information Foraging for Enhancing Implicit Feedback in Content-based Image Recommendation</article-title>
          .
          <source>In Proceedings of the 11th Forum for Information Retrieval Evaluation</source>
          (pp.
          <fpage>65</fpage>
          -
          <lpage>69</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2020</year>
          , April).
          <article-title>Utilising information foraging theory for user interaction with image query auto-completion</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          (pp.
          <fpage>666</fpage>
          -
          <lpage>680</lpage>
          ). Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Nogueira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ciaramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Learning to coordinate multiple reinforcement learning agents for diverse query reformulation</article-title>
          . arXiv preprint arXiv:
          <year>1809</year>
          .10658.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J. T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Spink</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Toward a web search model: Integrating multitasking, cognitive coordination, and cognitive shifts</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>62</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1446</fpage>
          -
          <lpage>1472</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>P. N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S. T.</given-names>
          </string-name>
          (
          <year>2010</year>
          ,
          <article-title>October)</article-title>
          .
          <article-title>Predicting shortterm interests using activity-based search context</article-title>
          .
          <source>In Proceedings of the 19th ACM international conference on Information and knowledge management</source>
          (pp.
          <fpage>1009</fpage>
          -
          <lpage>1018</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abbeel</surname>
            ,
            <given-names>O. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mordatch</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Multi-agent actor-critic for mixed cooperative-competitive environments</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          (pp.
          <fpage>6379</fpage>
          -
          <lpage>6390</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>Y. W.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Personalized web-document ltering using reinforcement learning</article-title>
          .
          <source>Applied Arti cial Intelligence</source>
          ,
          <volume>15</volume>
          (
          <issue>7</issue>
          ),
          <fpage>665</fpage>
          -
          <lpage>685</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>Y. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          (
          <year>2000</year>
          , January).
          <article-title>A reinforcement learning agent for personalized information ltering</article-title>
          .
          <source>In Proceedings of the 5th international conference on Intelligent user interfaces</source>
          (pp.
          <fpage>248</fpage>
          -
          <lpage>251</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Eliassen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , J rgensen, C.,
          <string-name>
            <surname>Mangel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Giske</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Exploration or exploitation: life expectancy changes the value of learning in foraging strategies</article-title>
          .
          <source>Oikos</source>
          ,
          <volume>116</volume>
          (
          <issue>3</issue>
          ),
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Yue</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2009</year>
          , June).
          <article-title>Interactively optimizing information retrieval systems as a dueling bandits problem</article-title>
          .
          <source>In Proceedings of the 26th Annual International Conference on Machine Learning</source>
          (pp.
          <fpage>1201</fpage>
          -
          <lpage>1208</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Balabanovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Exploring versus exploiting when learning user models for text recommendation</article-title>
          .
          <source>User Modeling</source>
          and
          <string-name>
            <surname>User-Adapted Interaction</surname>
          </string-name>
          ,
          <volume>8</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>71</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Van Rijsbergen</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>The geometry of information retrieval</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Von Neumann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <source>Mathematical Foundations of Quantum Mechanics: New Edition</source>
          . Princeton university press.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Fakhari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajagopal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balakrishnan</surname>
            ,
            <given-names>S. N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Busemeyer</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Quantum inspired reinforcement learning in changing environment</article-title>
          .
          <source>New Mathematics and Natural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>03</issue>
          ),
          <fpage>273</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Agichtein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2020</year>
          , April).
          <article-title>RLIRank: Learning to Rank with Reinforcement Learning for Dynamic Search</article-title>
          .
          <source>In Proceedings of The Web Conference</source>
          <year>2020</year>
          (pp.
          <fpage>2842</fpage>
          -
          <lpage>2848</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uprety</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Quantum-inspired complex word embedding</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .11351.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melucci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2019</year>
          , May).
          <article-title>Semantic Hilbert space for text representation learning</article-title>
          .
          <source>In The World Wide Web Conference</source>
          (pp.
          <fpage>3293</fpage>
          -
          <lpage>3299</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2019</year>
          ,
          <article-title>July). A generalized language model in tensor space</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Arti cial Intelligence</source>
          (Vol.
          <volume>33</volume>
          , pp.
          <fpage>7450</fpage>
          -
          <lpage>7458</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharir</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shashua</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          , June).
          <article-title>On the expressive power of deep learning: A tensor analysis</article-title>
          .
          <source>In Conference on Learning Theory</source>
          (pp.
          <fpage>698</fpage>
          -
          <lpage>728</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lioma</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>P.</given-names>
            , &amp;
            <surname>Simonsen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. G.</surname>
          </string-name>
          (
          <year>2019</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>Encoding word order in complex embeddings</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>