=Paper= {{Paper |id=Vol-2911/paper1 |storemode=property |title=ExDocS: Evidence based Explainable Document Search |pdfUrl=https://ceur-ws.org/Vol-2911/paper1.pdf |volume=Vol-2911 |authors=Sayantan Polley,Atin Janki,Marcus Thiel,Juliane Hoebel-Mueller,Andreas Nuernberger }} ==ExDocS: Evidence based Explainable Document Search== https://ceur-ws.org/Vol-2911/paper1.pdf
ExDocS: Evidence based Explainable Document Search
Sayantan Polley*1 , Atin Janki*1 , Marcus Thiel1 , Juliane Hoebel-Mueller1 and
Andreas Nuernberger1
1
    Otto von Guericke University Magdeburg, Universitätsplatz 2, 39106 Magdeburg, Germany – first authors with * have equal contribution


                                             Abstract
                                             We present an explainable document search system (ExDocS), based on a re-ranking approach, that uses textual and visual
                                             explanations to explain document rankings to non-expert users. ExDocS attempts to answer questions such as “Why is
                                             document X ranked at Y for a given query?”, “How do we compare multiple documents to understand their relative rankings?”.
                                             The contribution of this work is on re-ranking methods based on various interpretable facets of evidence such as term
                                             statistics, contextual words, and citation-based popularity. Contribution from the user interface perspective consists of
                                             providing intuitive accessible explanations such as: “document X is at rank Y because of matches found like Z” along with
                                             visual elements designed to compare the evidence and thereby explain the rankings. The quality of our re-ranking approach
                                             is evaluated on benchmark data sets in an ad-hoc retrieval setting. Due to the absence of ground truth of explanations, we
                                             evaluate the aspects of interpretability and completeness of explanations in a user study. ExDocS is compared with a recent
                                             baseline - explainable search system (EXS), that uses a popular posthoc explanation method called LIME. In line with the
                                             “no free lunch” theorem, we find statistically significant results showing that ExDocS provides an explanation for rankings
                                             that are understandable and complete but the explanation comes at the cost of a drop in ranking quality.

                                             Keywords
                                             Explainable Rankings, XIR, XAI, Re-ranking



1. Introduction                                                                                                           2. How do we compare multiple documents to un-
                                                                                                                             derstand their relative rankings?
Explainability in Artificial intelligence (XAI) is currently                                                              3. Are the explanations provided interpretable and
a vibrant research topic that attempts to make AI systems                                                                    complete?
transparent and trustworthy to the concerned stakehold-
                                                                                                                      There have been works [5], [7] in the recent past that
ers. The research in XAI domain is interdisciplinary but
                                                                                                                      attempted to address related questions such as "Why is a
is primarily led by the development of methods from the
                                                                                                                      document relevant to the query?" by adapting XAI meth-
machine learning (ML) community. From the classifi-
                                                                                                                      ods such as LIME [3] primarily for neural rankers. We
cation perspective, e.g., in a diagnostic setting a doctor
                                                                                                                      argue that the idea of relevance has deeper connotations
may be interested to know that how prediction for a dis-
                                                                                                                      related to the semantic and syntactic notion of similarity
ease is made by the AI-driven solution. XAI methods in
                                                                                                                      in text. Hence, we try to tackle the XAI problem from
ML are typically based on exploiting features associated
                                                                                                                      a ranking perspective. Based on interpretable facets we
with a class label, development of add-on model specific
                                                                                                                      provide a simple re-ranking method that is agnostic of
methods like LRP [2], model agnostic ways such as LIME
                                                                                                                      the retrieval model. ExDocS provides local textual ex-
[3] or causality driven methods [4]. The explainability
                                                                                                                      planations for each document (Part D in Fig. 1). The
problem in IR is inherently different from a classification
                                                                                                                      re-ranking approach enables us to display the “math be-
setting. In IR, the user may be interested to know how a
                                                                                                                      hind the rank” for each of the retrieved documents (Part
certain document is ranked for the given query or why a
                                                                                                                      E in Fig. 1). Besides, we also provide a global explana-
certain document is ranked higher than others [5]. Often
                                                                                                                      tion in form of a comparative view of multiple retrieved
an explanation is an answer to a why question [6].
                                                                                                                      documents (Fig. 4).
   In this work, Explainable Document Search (ExDocS),
                                                                                                                         We discuss relevant work for explainable rankings
we focus on a non-web ad-hoc text retrieval setting and
                                                                                                                      in section two. We describe our contribution to the re-
aim to answer the following research questions:
                                                                                                                      ranking approach and methods to generate explanation in
                  1. Why is a document X ranked at Y for a given                                                      section three. Next in section four, we discuss the quanti-
                     query?                                                                                           tative evaluation of rankings on benchmark data sets and
The 1st International Workshop on Causality in Search and                                                             a comparative qualitative evaluation with an explainable
Recommendation (CSR’21), July 15, 2021, Online                                                                        search baseline in a user study. To our knowledge, this is
" sayantan.polley@ovgu.de (S. Polley*); atin.janki@ovgu.de                                                            one of the first works comparing two explainable search
(A. Janki*); marcus.thiel@ovgu.de (M. Thiel);
                                                                                                                      systems in a user study. In section five, we conclude
juliane.hoebel@ovgu.de (J. Hoebel-Mueller);
andreas.nuernberger@ovgu.de (A. Nuernberger)                                                                          that ExDocS provides explanations that are interpretable
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                      and complete. The results are statistically significant in
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        Wilcoxon signed-rank test. However, the explanations
Figure 1: The ExDocS Search Interface. Local Textual explanation, marked (D), explains the rank of a document with a simpli-
fied mathematical score (E) used for re-ranking. A query-term bar, marked (C), for each document signifies the contribution
of each query term. Other facets of Local explanation can be seen in Fig 2 & 3. A running column in the left marked (B) shows
a gradual fading of color shade with decreasing rank. Global explanation via document comparison marked here as (A), is
shown in Fig 4. Showing search results for a sample query - ‘wine market’ on EUR-Lex [1] dataset.



come at a cost of reduced ranking performance paving
way for future work. The ExDocS system is online1 and
the source code is available on-request for reproducible
research.


2. Related Work
The earliest attempts on making search results explain-
able can be seen through the visualization paradigms
[8, 9, 10] that aimed at explaining term distribution and
statistics. Mi and Jiang [11] noted that IR systems were
one of the earliest among other research fields to offer
interpretations of system decisions and outputs, through
search result summaries. The areas of product search [12]      Figure 2: Contribution of Query Terms for relevance
and personalized professional search [13], have explored
explanations for search results by creating knowledge-
graphs based on user’s logs. In [14] Melucci made a
                                                               plainability, the perspective of ethics and fairness [15, 16]
preliminary study and suggested that structural equa-
                                                               is also often encountered in IR whereby the retrieved data
tion models from the causal perspective can be used to
                                                               may be related to disadvantaged people or groups. In
generate explanations for search systems. Related to ex-
                                                               [17] a categorization of fairness in rankings is devised
    1
        https://tinyurl.com/ExDocSearch                        based on the use of pre-processing, in-processing, or
                                                             simple intuitive mathematical explanation of each rank
                                                             with reproducible results. Hence, we start with a com-
                                                             mon TF-IDF based vector space model (VSM as OOTB
                                                             Apache Solr) with cosine similarity (ClassicSimilarity).
                                                             VSM helped us to separate the contributions of query
                                                             terms enabling us to analytically explain the ranks. BM25
                                                             was not deemed suitable for explaining the rankings to a
                                                             user, since it could not be interpreted completely analyt-
                                                             ically. On receiving a user query, we expand the query
                                                             and search the index. The top hundred results are passed
                                                             to the re-ranker (refer to Algo. 1) to get the final results.
                                                             Term-count is taken as the first facet of evidence since we
                                                             assumed that it is relatively easy to analytically explain to
                                                             a non-expert end-user as: “document X has P % relative
                                                             occurrences.. compared to the best matching document”
Figure 3: Coverage of matched terms in a document
                                                             (refer to Part E in 1). The assumption on term-count
                                                             is also in line with a recent work [18] on explainable
                                                             rankings.
post-processing strategies.                                     Skip-gram word-embeddings are used to determine
   Recently there has been a rise in the study of inter- contextual words. About two to three nearest neighbor
pretability of neural rankers [5, 7, 18]. While [5] uses words are used to expand the query. Additionally, the
LIME, [7] uses DeepSHAP for generating explanations WordNet thesaurus is used to detect synonyms. The opti-
and both of them differ considerably. Neural ranking can mal combination of the ratio of word-embeddings versus
be thought of as an ordinal classification problem, thereby synonyms is empirically determined by ranking perfor-
making it easier to leverage the XAI concepts from the mance. Re-ranking is performed based on the proportion
ML community to generate explanations. Moreover, [18] of co-occurring words. This enables us to provide local
generates explanations through visualization using term explanations such as “document X is ranked at position
statistics and highlighting important passages within the Y because of matches found for synonyms like A and
documents retrieved. Apart from this, [19] offers a tool contextual words like B”. Citation analysis is performed
built upon Lucene to explain the internal workings of the by making multiple combinations of weighted in-links,
Vector Space Model, BM25, and Language Model, but it is Page Rank, and HITS score for each document. Citation
aimed at assisting researchers and is still far from an end analysis was selected and deemed as an interpretable
user’s understanding. ExDocS also focuses on explaining facet that we named “document popularity”. We argue
the internal operations of the search system similar to that this could be used to generate understandable expla-
[19], however, it uses a custom ranking approach.            nations such as: “document X is ranked at Y because of
   Singh and Anand’s EXS [5] comes closest to ExDocS the presence of popularity”. Finally, we re-rank using the
in terms of the questions they aim to answer through following facets as shown below:
explanations, such as - "Why is a document relevant to
                                                                   • Keyword Search: ‘term statistics’ (term-count)
the query?" and "Why is a document ranked higher than
                                                                   • Contextual Search: ‘context-words’ (term-
the other?". EXS uses DRMM (Deep Relevance Matching
                                                                     count of query words + expanded contextual
Model), a pointwise neural-ranking model that uses a
                                                                     words by word-embeddings).
deep architecture at the query term level for relevance
                                                                   • Synonym Search: ‘contextual words’ (term-
matching. For generating explanations it employs LIME
                                                                     count of query words + expanded contextual
[3]. We consider the explanations from EXS as a fair
                                                                     words). Contextual words are synonyms, in this
baseline and compare with ExDocS in a user-study.
                                                                     case, using Word-Net.
                                                                   • Contextual and Synonym Search: ‘contex-
3. Concept: Re-ranking via                                           tual words’ (term-count of query words + ex-
                                                                     panded contextual words). Contextual words are
     Interpretable facets                                            word-embeddings+synonyms in this case.
The concept behind ExDocS is based on the re-ranking               • Keyword Search with Popularity score:
of interpretable facets of evidence such as term statistics,         ‘citation-based popularity’ (popularity score of a
contextual words, and citation-based popularity. Each                document)
of these facets is also a selectable search criterion in      Based on benchmark ranking performance, we empiri-
the search interface. We have a motivation to provide a       cally determine a weighted combination of these facets
which is also available as a search criteria choice in the   to Table 1). We benchmark our retrieval performance
interface. Additionally, we provide local and global vi-     by comparing with [21] and confirm that our ranking
sual explanations. Local ones in form of visualizing the     approach needs improvement to at least match the
contribution of features (expanded query terms) for each     baseline performance metrics.
document as well as comparing them globally for mul-
tiple documents (refer the Evidence Graph in the lower       4.2. Evaluation of explanations
part of Fig. 4).
                                                             We performed a user study to qualitatively evaluate the
                                                             explanations. Also, to compare ExDocS’s explanations
       input : q = {w1,w2,...,wn}, D = {d1,d2,...,dm},
                                                             with that of EXS; we integrated EXS’s explanation model
                facet
                                                             into our interface. Therefore, keeping the look and feel of
       output : A re-ranked doc list
                                                             both systems alike, we tried to reduce user’s bias towards
   1   Select top-k docs from D using cosine similarity,     any system.
        such as
                                                             4.2.1. User study setup
                   {𝑑′ 1, 𝑑′ 2, ..., 𝑑′ 𝑘} ∈ 𝐷𝑘
                                                             A total of 32 users participated in a lab controlled user
        for 𝑖 ← 1 to 𝑘 do                                    study. 30 users were from a computer science background
   2      if facet == ‘term statistics’ or ‘contextual       while 26 users had a fair knowledge of information re-
           words’ then                                       trieval systems. Each user was asked to test out both
   3           evidence(di)← Σ𝑤∈𝑞 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑑𝑖)               the systems and the questionnaire was formatted in a
   4           // count(w, di) is count of                   Latin-block design. The name of the systems was masked
                  term w in di                               as System-A (EXS) and System-B (ExDocS).
   5       end
   6       if facet == ‘citation-based popularity’ then      4.2.2. Metrics for evaluation
   7            evidence(di)← 𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦𝑆𝑐𝑜𝑟𝑒(𝑑𝑖)
   8           // popularityScore(di) could                  We use the existing definitions ([6] and [22]) of Inter-
                  be inLinks count, PageRank                 pretability, Completeness and Transparency in the com-
                  or HITS score of di                        munity with respect to evaluation in XAI. The following
   9     end                                                 factors are used for evaluating the quality and effective-
  10 end                                                     ness of explanations:
  11 Rerank all docs in Dk using evidence
                                                                  • Interpretability: describing the internals of a sys-
  12 return Dk
                                                                    tem in human-understandable terms [6].
       Algorithm 1: Re-ranking algorithm                          • Completeness: describing the operation of a sys-
                                                                    tem accurately and allowing the system’s behav-
                                                                    ior to be anticipated in future [6].
                                                                  • Transparency: an IR system should be able to
4. Evaluation                                                       demonstrate to its users and other interested par-
                                                                    ties, why and how the proposed outcomes were
We have two specific focus areas in evaluation. The first           achieved [22].
one is related to the quality of the rankings and the second
one is related to the explainability aspect. We leave out
evaluation of the popularity score model for future work. 4.3. Results and Discussion
                                                           We discuss the results of our experiments and draw con-
4.1. Evaluation of re-ranking algorithm                    clusions to answer the research questions.
                                                              RQ1. Why is a document X ranked at Y for a
We experimented the re-ranking algorithm on the TREC given query?
Disk 4 & 5 (-CR) dataset. The evaluations were carried out We answer this question by providing the individual tex-
by using the trec_eval[20] package. We used TREC-6 ad- tual explanation for every document (refer to Part D of
hoc queries (topics 301-350) and used only ‘Title’ of the Fig. 1) on the ExDocS interface. The “math behind the
topics as the query. We noticed that Keyword Search, rank” (refer to Part E of Fig. 1) of a document is explained
Contextual Search,            Synonym Search,         and as a percentage of the evidence with respect to the best
Contextual Synonym Search systems were unable
                                                           matching document.
to beat the ‘Baseline ExDocS’ (OOTB Apache Solr) on
metrics such as MAP, R-Precision, and NDCG (refer
Figure 4: Global Explanation by comparison of evidence for multiple documents (increasing ranks from left to right). A title-
body image is provided, marked (A), to indicate whether the query term was found in title and/or body. The column marked
(B), represents the attributes for comparison.


Table 1
MAP, R-Precision, and NDCG values for ExDocS search systems against TREC-6 benchmark values*[21]

                          IR Systems                           MAP      R-Precision     NDCG
                          csiro97a3*                           0.126       0.1481          NA
                          DCU97vs*                             0.194       0.2282          NA
                          mds603*                              0.157       0.1877          NA
                          glair61*                             0.177       0.2094          NA
                          Baseline ExDocS                      0.186       0.2106         0.554
                          Keyword Search                       0.107       0.1081         0.462
                          Contextual Search                    0.080       0.0955         0.457
                          Synonym Search                       0.078       0.0791         0.411
                          Contextual and Synonym Search        0.046       0.0526         0.405



  RQ2. How do we compare multiple documents                     1. 96.88% of the users understood the textual expla-
to understand their relative rankings?                             nations of ExDocS
We provide an option to compare multiple documents              2. 71.88% of the users understood the relation be-
through visual and textual paradigms (refer to Fig. 4). The        tween the query term and features (synonyms or
evidence can be compared and contrasted and thereby un-            contextual words) shown in the explanation
derstand the reasons for a document’s rank being higher         3. Users gave a mean rating of 4 out of 5 (standard
or lower than others.                                              deviation = 1.11) to ExDocS on the understand-
  RQ3. Are the generated explanations inter-                       ability of the percentage calculation for rankings,
pretable and complete?                                             shown as part of the explanations
We evaluate the quality of the explanations in terms of
their interpretability and completeness. Empirical evi- When users were explicitly asked - whether they could
dence from the user study on Interpretability:              “gather an understanding of how the system functions
                                                            based on the given explanations”, users gave a positive
response with a mean rating of 3.84 out of 5 (standard           5. Conclusion and Future Work
deviation = 0.72). The above-mentioned empirical evi-
dence indicates that the ranking explanations provided           In this work, we present an Explainable Document Search
by ExDocS can be deemed as interpretable.                        (ExDocS) system that attempts to explain document rank-
  Empirical evidence from the user study on Complete-            ings using a combination of textual and visual elements
ness:                                                            to a non-expert user. We make use of word embeddings
                                                                 and WordNet thesaurus to expand the user query. We use
    1. All users found the features shown in the expla-          various interpretable facets such as term statistics, con-
       nation of ExDocS to be reasonable (i.e. sensible          textual words, and citation-based popularity. Re-ranking
       or fairly good)                                           results from a simple vector space model with such in-
    2. 90.63% of the users understood through compara-           terpretable facets help us to explain the “math behind
       tive explanations of ExDocS that- why a partic-           the rank” to an end-user. We evaluate the explanations
       ular document was ranked higher or lower than             by comparing ExDocS with another explainable search
       other documents                                           baseline in a user study. We find statistically significant
Moreover, 78.13% of total users claimed that they could          results that ExDocs provides interpretable and complete
anticipate ExDocS behavior in the future based on the            explanations. Although, it was difficult to find a clear
understanding gathered through explanations (individual          winner between both systems in all aspects. In line with
and comparative). Based on the above empirical evidence          the “no free lunch” theorem, the results show a drop in
we argue that the ranking explanations generated by              ranking quality on benchmark data sets at the cost of
ExDocS can be assumed to be complete.                            getting comprehensible explanations. This paves way
   Transparency: We investigate if the explanations              for ongoing research to include user feedback to adapt
make ExDocS more transparent [22] to the user. Users             the rankings and explanations. ExDocS is currently be-
gave ExDocS a mean rating of 3.97 out of 5 (standard             ing evaluated in domain-specific search settings like law
deviation = 0.86) on ‘Transparency’ based on the indi-           search where explainability is a key factor to gain user
vidual (local) explanations. In addition to that, 90.63%         trust.
of the total users indicated that ExDocS became more
transparent after reading the comparative (global) expla-        References
nations. This indicates that explanations make ExDocS
more transparent to the user.                                     [1] E. L. Mencia, J. Fürnkranz, Efficient multilabel clas-
                                                                      sification algorithms for large-scale problems in
                                                                      the legal domain, in: Semantic Processing of Legal
                                                                      Texts, Springer, 2010, pp. 192–215.
                                                                  [2] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R.
                                                                      Müller, W. Samek, On pixel-wise explanations for
                                                                      non-linear classifier decisions by layer-wise rele-
                                                                      vance propagation, PloS one 10 (2015) e0130140.
                                                                  [3] M. T. Ribeiro, S. Singh, C. Guestrin, "Why Should I
                                                                      Trust You?": Explaining the Predictions of Any Clas-
                                                                      sifier, in: Proceedings of the 22nd ACM SIGKDD
                                                                      International Conference on Knowledge Discovery
                                                                      and Data Mining, KDD ’16, Association for Com-
                                                                      puting Machinery, New York, NY, USA, 2016, p.
                                                                      1135–1144.
Figure 5: Comparison of explanations from EXS and ExDocS
on different XAI metrics. All the values shown here are scaled    [4] J. Pearl, et al., Causal inference in statistics: An
between [0-1] for simplicity.                                         overview, Statistics surveys 3 (2009) 96–146.
                                                                  [5] J. Singh, A. Anand, EXS: Explainable Search Using
                                                                      Local Model Agnostic Interpretability, in: Proceed-
   Comparison of explanations between ExDocS                          ings of the Twelfth ACM International Conference
and EXS:                                                              on Web Search and Data Mining, WSDM ’19, Asso-
Both the systems performed similarly in terms of                      ciation for Computing Machinery, New York, NY,
𝑇 𝑟𝑎𝑛𝑠𝑝𝑎𝑟𝑒𝑛𝑐𝑦 and 𝐶𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑛𝑒𝑠𝑠. However, users                        USA, 2019, p. 770–773.
found ExDocS explanations to be more interpretable com-           [6] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter,
pared to that of EXS (refer to Fig. 5), and this compar-              L. Kagal, Explaining explanations: An overview of
ison was statistically significant in WSR test (|𝑊 | <                interpretability of machine learning, in: 2018 IEEE
𝑊𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙(𝛼 = 0.05,𝑁𝑟 = 10) = 10, where |𝑊 | = 5.5).
     5th International Conference on Data Science and             tems with Application to LinkedIn Talent Search,
     Advanced Analytics (DSAA), IEEE, 2018, pp. 80–89.            in: Proceedings of the 25th ACM SIGKDD In-
 [7] Z. T. Fernando, J. Singh, A. Anand, A Study on               ternational Conference on Knowledge Discovery
     the Interpretability of Neural Retrieval Models Us-          amp; Data Mining, KDD ’19, Association for Com-
     ing DeepSHAP, in: Proceedings of the 42nd Inter-             puting Machinery, New York, NY, USA, 2019, p.
     national ACM SIGIR Conference on Research and                2221–2231. URL: https://doi.org/10.1145/3292500.
     Development in Information Retrieval, SIGIR’19,              3330691. doi:10.1145/3292500.3330691.
     Association for Computing Machinery, New York,          [17] C. Castillo, Fairness and Transparency in Ranking,
     NY, USA, 2019, p. 1005–1008.                                 SIGIR Forum 52 (2019) 64–71.
 [8] M. A. Hearst, TileBars: Visualization of Term Distri-   [18] V. Chios, Helping results assessment by adding ex-
     bution Information in Full Text Information Access,          plainable elements to the deep relevance matching
     in: Proceedings of the SIGCHI Conference on Hu-              model, in: Proceedings of the 43rd International
     man Factors in Computing Systems, CHI ’95, ACM               ACM SIGIR Conference on Research and Develop-
     Press/Addison-Wesley Publishing Co., USA, 1995,              ment in Information Retrieval, Association for Com-
     p. 59–66.                                                    puting Machinery, New York, NY, USA, 2020. URL:
 [9] O. Hoeber, M. Brooks, D. Schroeder, X. D. Yang,              https://ears2020.github.io/accept_papers/2.pdf.
     TheHotMap.Com: Enabling Flexible Interaction in         [19] D. Roy, S. Saha, M. Mitra, B. Sen, D. Ganguly, I-REX:
     Next-Generation Web Search Interfaces, in: Pro-              A Lucene Plugin for EXplainable IR, in: Proceed-
     ceedings of the 2008 IEEE/WIC/ACM International              ings of the 28th ACM International Conference on
     Conference on Web Intelligence and Intelligent               Information and Knowledge Management, CIKM
     Agent Technology - Volume 01, WI-IAT ’08, IEEE               ’19, Association for Computing Machinery, New
     Computer Society, USA, 2008, p. 730–734.                     York, NY, USA, 2019, p. 2949–2952.
[10] M. A. Soliman, I. F. Ilyas, K. C.-C. Chang, URank:      [20] C. Buckley, et al., The trec_eval evaluation package,
     Formulation and Efficient Evaluation of Top-k                2004.
     Queries in Uncertain Databases, in: Proceedings of      [21] D. K. Harman, E. Voorhees, The Sixth Text RE-
     the 2007 ACM SIGMOD International Conference                 trieval Conference (TREC-6), US Department of
     on Management of Data, SIGMOD ’07, Association               Commerce, Technology Administration, National
     for Computing Machinery, New York, NY, USA,                  Institute of Standards and Technology (NIST), 1998.
     2007, p. 1082–1084.                                     [22] A. Olteanu, J. Garcia-Gathright, M. de Rijke, M. D.
[11] S. Mi, J. Jiang, Understanding the Interpretability          Ekstrand, Workshop on Fairness, Accountability,
     of Search Result Summaries, in: Proceedings of the           Confidentiality, Transparency, and Safety in Infor-
     42nd International ACM SIGIR Conference on Re-               mation Retrieval (FACTS-IR), in: Proceedings of the
     search and Development in Information Retrieval,             42nd International ACM SIGIR Conference on Re-
     SIGIR’19, Association for Computing Machinery,               search and Development in Information Retrieval,
     New York, NY, USA, 2019, p. 989–992.                         2019, pp. 1423–1425.
[12] Q. Ai, Y. Zhang, K. Bi, W. B. Croft, Explainable
     Product Search with a Dynamic Relation Embed-
     ding Model, ACM Trans. Inf. Syst. 38 (2019).
[13] S. Verberne, Explainable IR for personalizing pro-
     fessional search, in: ProfS/KG4IR/Data: Search@
     SIGIR, 2018.
[14] M. Melucci, Can Structural Equation Models In-
     terpret Search Systems?, in: Proceedings of the
     42nd International ACM SIGIR Conference on Re-
     search and Development in Information Retrieval,
     SIGIR’19, Association for Computing Machinery,
     New York, NY, USA, 2019. URL: https://ears2019.
     github.io/Melucci-EARS2019.pdf.
[15] A. J. Biega, K. P. Gummadi, G. Weikum, Equity of
     attention: Amortizing individual fairness in rank-
     ings, in: The 41st International ACM SIGIR confer-
     ence on Research & Development in Information
     Retrieval, 2018, pp. 405–414.
[16] S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-
     Aware Ranking in Search and Recommendation Sys-