<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>TOIS</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Selective Query processing: A Risk Sensitive Approach - Abstract</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Josiane Mothe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Zia Ullah</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INSPE</institution>
          ,
          <addr-line>UT2J, Univ. de Toulouse, IRIT UMR5505 CNRS, Toulouse</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Univ. de Toulouse, IRIT UMR5505 CNRS</institution>
          ,
          <addr-line>Toulouse</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>37</volume>
      <issue>2018</issue>
      <fpage>04</fpage>
      <lpage>07</lpage>
      <abstract>
        <p>While search engines apply a single optimised search strategy to any user query, selective search, like selective query expansion, aims to apply an adapted search strategy to each query. Search phases include query expansion, search-weighting model, and document ranking. A search strategy is defined by the combination of components and their hyperparameters in these phases. The number of possible search strategies is huge. In this paper, we describe a risk-sensitive approach to optimise the set of search strategies that should be included in a selective search approach. It solves the problem of which and how many search strategies to include in the system. We found that using 20 search strategies is an appropriate trade-of between efectiveness and system complexity. Significant efectiveness improvement is about 23% when compared to L2R documents and about 10% when compared to other selective approaches. This paper is an extended abstract of our paper at CIKM 2021 1.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information Systems</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Models and ranking</kwd>
        <kwd>System efectiveness</kwd>
        <kwd>Adaptive information retrieval</kwd>
        <kwd>Query processing</kwd>
        <kwd>Per-query processing</kwd>
        <kwd>Query driven parameterisation</kwd>
        <kwd>Search engine parameters</kwd>
        <kwd>System selection</kwd>
        <kwd>Risk sensitive systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>• Search strategy pool: The system has a pool of search strategies it can choose from;
• Selective search strategy: The system selects the search strategy from the pool to be applied
to the current query using the “best fit" principle.</p>
      <p>In related work, either the search strategy pool is limited from 2 to 8 search strategies, or it
contains from 20,000 to 80,000 search strategies. In the latter case, to be applicable in real world
systems, the pool needs to be restricted.</p>
      <p>The main purpose of this paper is to limit the number of search strategies for the most
appropriate ones. The risk sensitive approach we developed in this paper aims to define the
search strategies and their number to be considered in a selective search approach. This risk
sensitive approach is grounded on Wang et al.’s  measure for learning to rank documents 2.
 purpose is to decide on the document ranking for a given query. The risk here is defined
as, “The risk [for the system] of performing a given particular query less efectively than a
given baseline system" 3.</p>
      <p>We have adapted  and defined  ( ) as a function for selecting candidate search
strategies. This function measures the risk associated with selecting the search strategy  for a
given query rather than a reference search strategy . The risk relates to  being greater than
 .</p>
      <p>The risk function  ( ) accumulates the risk relative to queries in terms of efectiveness
for the training query set.  ( ) is defined in Eq. (1) where  is the training query
set consisting of  queries, (, ) is the performance (efectiveness) of the reference search
strategy  for the query , and  is a search strategy from the initial pool.  ( ) hence
accumulates the loss in efectiveness in relation to queries when the search strategy  is selected,
whereas the reference search strategy  would have been the better choice: it corresponds to
the maximum possible risk.</p>
      <p>( ) =
1 ∑︁

∈
1 ∑︁
 ( ) =</p>
      <p>∈</p>
      <p>In addition to the risk function, we have defined the corresponding reward function. It is
based on the potential increase in overall efectiveness (Eq. 2) using the search strategy  . The
reward function is defined as follows:
The reward function aggregates the improvement in efectiveness that would happen if the
system only used  for the queries, and  performs better than the reference search strategy
.</p>
      <p>We adapted formulas of Eqs. 1 and 2 to fit the problem of selecting a set of search strategies
to keep.
2L. Wang, P. N. Bennett, K. Collins-Thompson, Robust ranking models via risk-sensitive optimization, in: SIGIR,
2012, pp. 761–770
3B. T. Dinçer, C. Macdonald, I. Ounis, Risk-sensitive evaluation and learning to rank using multiple baselines, in:
SIGIR, 2016, pp. 483–492
max(0, (, ) − ( , ))
max(0, ( , ) − (, ))
(1)
(2)
.</p>
      <p>Let ℛ be the entire initial search strategy pool and − 1 be the set of search strategies that
have already been selected at step  with an initial 0 = {}, which is a point of reference
or the first selected search strategy.  is the set of training queries and (, ) denotes the
retrieval efectiveness (e.g., nDCG@10) for the query  ∈  processed by the search strategy
be added to − 1 at step  using Eq. 1 in terms of efectiveness as follows:</p>
      <p>Given ℛ and − 1, we define the risk for selecting the new search strategy  ∈ ℛ ∖ − 1 to
1</p>
      <p>∑︁
| | ∈
 (, − 1) =</p>
      <p>∈− 1
max(0,
max (( , )) − (, ))
queries: it therefore adapts Eq. 1.
where max∈− 1 ( , ) is the maximum efectiveness for the query  in relation to the set of
search strategies that have already been selected in − 1. In Eq. 3, the risk of adding the search
strategy  is measured as the cumulative decrease in efectiveness which the meta-system can
achieve if it chooses  rather than the best search strategy in − 1 for each of the training
Likewise, we define the reward function using Eq. 2 in terms of efectiveness as follows:
 (, − 1) =
1</p>
      <p>∑︁
| | ∈
max(0, (, ) − ∈− 1
max
( , ))
and already selected search strategies − 1 is defined as:</p>
      <p>The overall gain for the search strategy  ∈ ℛ ∖ − 1 in relation to the set of training queries
(, − 1) = (, − 1) − (1 +  )(, − 1)
where the functions (, − 1) and (, − 1) refer to the efectiveness-based
Eqs. 3 and 4, respectively. The  ≥</p>
      <p>0 is a risk sensitive parameter that controls the trade-of
to the following equation:
between risk and reward. In our case, we set  as 0 to weight risk and reward equally. We keep
a statistical analysis of this risk sensitive parameter for future work.</p>
      <p>Finally, at step  we select the search strategy * which maximises the overall gain according
We then update − 1 as follows:
* =
argmax
∈ℛ∖− 1
︂(
(, − 1)</p>
      <p>︂)
 = − 1 ∪ {*}
where  is the set of  risk sensitive search strategies selected for a set of training queries  .</p>
      <p>The risk sensitive criteria model we propose is generic enough to be applied to any selective
search strategy approach.</p>
      <p>For the selective search strategy part, we use learning to rank (L2R) algorithms to rank the
search strategies as suggested in Deveaud et al. 4. The principle is thus to train a ranking model
4R. Deveaud, J. Mothe, M. Z. Ullah, J.-Y. Nie, Learning to adaptively rank document retrieval system configurations,
(3)
(4)
(5)
(6)
(7)
(,  ) = (, ) to assign a score to a given query-search strategy pair (,  ) i.e., a given
feature vector , . More generally, the ranking model can rank all the search strategies for
a given query . In this case, the ranking model is (, ) = (fi), where  is the set of
search strategies and f = (,1, ,2, · · · , ,||) is the set of feature vectors for the query-search
strategy pairs. Like the L2R documents model, the ranking model (, ) is learned from the
training data by minimising the loss function ℒ(; f , ).</p>
      <p>To evaluate our contributions, we considered three standard TREC collections from the Adhoc
tasks. TREC78 (about 500K newspaper articles), WT10G (1.6 million Web/blog page documents),
and GOV2 (25 million web pages).</p>
      <p>Search strategies were built by varying the IR components and some of their hyperparameters.
We considered the term weighting model and automatic query expansion components from
which we considered diferent variants from the literature. We also considered diferent values
of the hyperparameter related to query expansion. This results in more than 20, 000 search
strategies which were the input for the  selection model used to reduce this set based on
training queries.</p>
      <p>Query-search strategy training examples follow a vector-based representation: the features
, depend on the query (), the search strategy ( ), and a label. We opted for LETOR features
that have been used successfully for document ranking models. We calculated LETOR features
directly from an initial search that we performed using a reference system (BM25). Finally, the
query-search strategy vectors were labelled by the efectiveness of the search strategy when
treating that query.</p>
      <p>We made , the number of search strategies, vary and found that 20 is appropriate for real
world environments (see Figure 1).</p>
      <p>Table 1 shows additional results on GOV2 TREC adhoc collections. The first block shows
baselines that use a single search strategy for all queries; the second block shows trained
selective search strategy (SelSS) including ours ( = 20 search strategies); the latest shows
oracles. The first oracle is the Best conf. row where the best configuration of the pool is selected
a posteriori. The second line in this block is when for each query we select the best configuration
for that specific query; this is also an a posteriori selection. Efectiveness in absolute value,
averaged on 3 draws plus standard deviation in square brackets. The best values (excluding
Oracle) are in bold font. △ (resp. ↑) indicates statistically significant improvement compared to
the L2R documents (resp. Deveaud et al.), two-tailed paired t-test ( &lt; 0.05). Results on the
two other TREC collections were consistent.</p>
      <p>From these results, we can see that our method is better than any single query processing
strategy (i.e. system configuration), even if we could select it automatically (which is not the
case in real life). Our strategy is also better than selective query expansion, where the query
processing difer from one query to the other but with just two possible choices. It is also
slightly and statistically better than when using 20 000 configurations in the initial pool. This
is the most interesting and original results since it shows that we need a certain number of
query processing strategies to be efective but not necessarily a huge number which makes the
approach more realistic and feasible in real IR systems.</p>
      <p>The method we present here uses 20 search strategies that the system learned to choose
according to query features. This risk sensitive selective search is efective to increase overall
efectiveness. Significant efectiveness improvement is about 23% when compared to L2R
documents and about 10% when compared to other selective approaches on 3 adhoc TREC
collections. Selective query expansion used only two search strategies, which limits the system
options. On the other hand, in another study, we used 20, 000 search strategies which limits its
practical usability. We show that the E-risk approach we presented is more appropriate, both
to provide enough options to the system to choose among and to keep it manageable in terms
of maintenance. This paper is an extended abstract of Mothe and Ullah CIKM 2021 paper 1.
Moreover, this research is patented. 5
5EP3771996A1: Information retrieval device and method using set of search configurations pre-selected
using eficiency and risk functions. https://worldwide.espacenet.com/patent/search/family/067956648/publication/
EP3771996A1?q=19305984.7 Josiane Mothe &amp; Md Zia Ullah</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>