Fair Exposure of Documents in Information Retrieval: a Community Detection Approach Adrian-Gabriel Chifu Josiane Mothe Md Zia Ullah Aix Marseille Univ, Université de IRIT UMR5505 CNRS, INSPE, Univ. de IRIT UMR5505 CNRS, Toulouse, Toulon, CNRS, LIS Toulouse, Toulouse, France France Marseille, France Josiane.Mothe@irit.fr mdzia.ullah@irit.fr adrian.chifu@univ-amu.fr ABSTRACT Another recent research direction is the so called “fair” exposure While (mainly) designed to answer users’ needs, search engines of documents. It has mainly been cast into a re-ranking problem and recommendation systems do not necessarily guarantee the with constraints and optimization functions [2, 29, 38]. In 2019, exposure of the data they store and index while it can be essential TREC also decided to tackle this problem in one of its track: the Fair for information providers. A recent research direction so called Ranking Track. The rationale is that not only users are concerned “fair” exposure of documents tackles this problem in information by which information is retrieved, but also information producers retrieval. It has mainly been cast into a re-ranking problem with who are certainly inclined to store their data on platforms that give constraints and optimization functions. This paper presents the them the best exposure to users. As defined by TREC organizers, the first steps toward a new framework for fair document exposure. Fair Ranking Track considers scientific documents and the central This framework is based on document linking and document com- goal is to provide fair exposure to different groups of authors. Fair munity detection; communities are used to rank the documents to exposure is certainly a much more complex problem than currently be retrieved according to an information need. In addition to the defined since when platforms disclose the algorithms they use, first step of this new framework, we present its potential through producers try to fit to be better exposed. However, this track as it both a toy example and a few illustrative examples from the 2019 is, corresponds to a first interesting step and is likely to result in TREC Fair Ranking Track data set. important milestones related to IR transparency. This paper presents the first steps toward a new framework for KEYWORDS fair document exposure in information retrieval. This framework is based on information clustering and document linking that is Information systems, Information retrieval, Fair document expo- then used to select the documents to be retrieved according to an sure, Document network, Document communities, Document re- information need. The rest of the paper is organized as follows: ranking Section 2 includes some related work. Section 3 presents the first step of our method along with a toy example. Section 4 describes the 1 INTRODUCTION preliminary results on some examples from the TREC Fair Ranking Track. Section 5 concludes this paper and presents our future work. While (mainly) designed to answer users’ needs, search engines and recommendation systems (RS) do not necessarily guarantee the exposure of the data they store and index. For example, many 2 RELATED WORK algorithms are founded on both content and collaborative selection Fair algorithms. Designing fair algorithms has recently been of items. They are likely to retrieve or recommend the items that emerged to tackle the underlying bias in the training data or al- other users have liked or seen on similar need topics. In the recom- gorithm itself. Chierichetti et al. [7] study the fair clustering algo- mendation domain, this problem is known as the cold start problem rithms (a protected class must have fair representation in different where new items are unlikely to be recommended since no user clusters) under the disparate impact [14]. The authors introduce clicked on them when just added [24, 26, 27]. In search engines, fairlets that satisfy fair representation while maintaining the cluster- ranking algorithms also include information on past searches and ing objective. The authors then reduce the fair clustering problem users’ preferences [15, 34] that favors the most popular documents into fairlet decomposition and use the classical clustering algorithm to the detriment of others, new or less known, although poten- to obtain the balanced representation of protective class in differ- tially equally relevant. To solve this problem, algorithms both in ent clusters. In further study, Chierichetti et al. [8] focus on the Information Retrieval (IR) and RS have been developed and aim large class of fair algorithmic problems for optimization subject to at diversifying the results. In IR, diversity is mainly considered matroid constraints such as cardinality (sets), connectivity (graph), in relation to ambiguity [9], where the problem is to provide the or matching (subgraph). They found that matroid constraints are user documents that answer the various meanings of a word (e.g. general enough to encode many different types of problems while Orange as a Telecom company, a French city, a color, or a fruit). In relatively easy to optimize. RS, many algorithms also diversify the results by adding items that Document diversity. While not initially formulated as a fair ex- would not be recommended otherwise [22, 26]. posure of data, diversity is nonetheless a way to provide document exposure. "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0)." https://fair-trec.github.io/ Adrian-Gabriel Chifu, Josiane Mothe, and Md Zia Ullah Diversification is the process of re-ranking the documents ini- categories. They noted that higher diversity and fairness would tially retrieved for the query or selected for recommendation, taking lead to increased user acceptance regarding the results. into account the coverage, importance, and novelty of the doc- From the evaluation point of view, the authors of [12] proposed uments in an implicit or explicit approach to reduce the search an interpolation strategy to integrate relevance and fairness into bias [37, 40]. Result diversification is generally formulated as an a unique evaluation metric. They investigated how existing TREC optimization problem and diversification methods differ by how corpora could be adapted for a fairness task. Castillo [6] reviewed they implement the objective function. the recent works on fairness and transparency in ranking. Some methods consider the document content only such as Maxi- In RS, Khenissi and Nasraoui modeled the user exposure in an mal Marginal Relevance (MMR) [5] which selects the best document iterative closed feedback loop and developed a de-biasing strategy incrementally by balancing the relevance with the query and topic to reduce the bias inherent in the recommendation system, such novelty. Fusion diversification is another example of content-based as [20]. Mehrotra et al. proposed a framework to evaluate the trade- methods where topic modeling is used to infer latent subtopics; off between the relevance of recommendation to the customers and the results of sub-topic rankers are then fused to obtain the final fairness of representation of suppliers in a two-sided marketplace ranking [23]. such as Airbnb or Spotify [29]. Other methods use external cues which represent users’ inten- In TREC 2019 Fair Ranking Track, Mcdonald et al. [28] cast tion like, Xia et al. who proposed a diverse ranking approach based the fair ranking as a diversification problem and employed two on continuous state Markov decision process to model the user diversification approaches based on xQuAD [36] by optimizing the perceived utility at different ranks [41]. Also considering users’ authors influence in one run, and authors and venues influence in perspective, Santos et al. proposed xQuAD, a greedy diversification another run; the results show that both of the runs are effective algorithm that maximizes the coverage of explicitly mined query in minimizing unfairness. Wang et al. [39] used the convolutional subtopics [36]. Wasilewski et al. proposed an intent-aware collab- kernel-based neural ranking model (Conv-KNRM) to obtain the orative filtering based recommendation system where the system relevance oriented ranking and then applied the documents swap- diversify the recommended items by integrating the relevance, user ping for authors exposure. Bonart, another participant, used a basic intentions, and item-aspect associations [40]. Abdollahpouri et al. learning-to-rank framework for solving the track, stressing that introduced a method to diversify the recommended items by reduc- the main challenge was that multiple objectives (searchers’ rele- ing the popularity bias and enhancing the novelty and coverage [1]. vance and authors’ group fairness) have to be optimized under the Küçüktunç [21] proposed an enhanced random-walk based diver- constraint of an unknown author-group-partition. sification for citation recommendation using the citation graph to allow users to reach the old, fundamental and recent papers. 3 PROPOSED METHODOLOGY Fair document exposure. None of the above mentioned methods While in this paper we are not providing a completed framework for provides fair item exposure in their re-ranking step. Recently, differ- fair information exposure, we are describing the first steps toward ent frameworks have been proposed for the formulation of fairness it. In this section, we present the main intuition that drove our constraints on rankings in terms of exposure allocation [2, 38]. proposal, the main principles as well as a toy example to illustrate In [38], the authors aim to find rankings that would maximize the the model. fairness with respect to various facets, by proposing a general frame- Rationale of our model. work that employs probabilistic rankings and linear programming. Rather than considering document ranking constraints, as in In IR, the concept of fair exposure is seen as the average attention previous work [17, 38], in order to solve the issue of fair docu- received by ranked items [13], as an optimization problem with ment exposure, our model relies on graph structure and community fairness constraints [17], or as a diversity ranking problem [18]. founding. Diaz et al. [13] argued that measuring and optimizing expected ex- Fair document exposure can be considered on different views or posure metrics using randomization opens a new area for retrieval considering different facets[38]. One of them is what we call fair algorithms, while Gao and Shah [17] estimated the solution space item exposure. In that case, the items in the retrieved documents and answered questions about the impact of fairness consideration should expose as many items as possible within a small as possible in the system, or about the trade-off between various optimization document set. For example, if items are authors, then the objective strategies. Still, Gao and Shah [18] studied the top-k diversity fair- is to expose as many authors as possible in the (relevant) retrieved ness ranking in terms of statistical parity fairness (equal number documents. The second view is what we call fair community expo- of documents from different groups) and disparate impact fairness sure. In that case, we want to provide a fair exposure of the different (unintentional discrimination to a group). communities of documents. A document community is based on Some approaches consider the fair exposure as a re-ranking the links between documents as defined by their shared items. The problem [25, 30]. Liu and Burke [25] generated the ranking list main idea is that the top ranked documents should not belong to the based on a linear combination trade-off between accuracy and same document community to ensure a fair exposure of the variety provider coverage and argued that the algorithm can significantly of documents in terms of their items (e.g., their authors, sources, promote fairness, however, with a slight loss in terms of accuracy. affiliation, opinions). Natwar et al. [30] introduced the concept of representative diversity, Document communities are extracted from the document network which means the level of interest of the consumer in different which is composed of the initially retrieved documents as nodes. As such the ground of the model is similar to the popular page rank Fair Exposure of Documents in Information Retrieval: a Community Detection Approach algorithm [4]. However, rather than considering the in and out the exposure (e.g., changing the type(s) of links to consider hyperlinks or explicit references between documents, document between the retrieved documents or changing the commu- links are rather extracted from implicit links. The implicit links are nity detection algorithms). based on meta-data that documents share (e.g., common authors). Problem formulation. Such links can be weighted and eventually oriented, depending on what they represent. Indeed, there is a large variety of type of • Document communities: links that can be considered in between documents. For example, 𝐷𝑘 is the set of 𝑘 documents {𝑑 1, 𝑑 2, · · · , 𝑑𝑘 }. Each 𝑑 𝑗 becomes a considering scientific publications, documents can be linked ac- node of the document network. An edge between two document cording to the authors they have in common: the more common nodes is typed and depends on the meta-data or items used. Let 𝐼𝑘𝑇 authors in between two documents, the higher the weight between be the set of items of type 𝑇 extracted from 𝐷𝑘 . The weight of the the two corresponding nodes in the network. Once the document edge between 𝑑𝑖 and 𝑑 𝑗 is defined as: network is built, the main document communities can easily be 𝑇 𝑇 𝐼𝑑 ∩ 𝐼𝑑 extracted using state of the art community detection algorithms  𝑖 𝑗 𝑊 𝑒𝑖𝑔ℎ𝑡 𝑑𝑖 , 𝑑 𝑗 = (1) such as Fast Greedy [10], Leading Eigenvector [32], Walktrap [33] 𝐼𝑑𝑇 ∪ 𝐼𝑑𝑇 or Infomap [35], for example. 𝑖 𝑗 The principle is then to define a document ranking that both • Fair exposure of items: merges the communities (top documents should belong to different Let 𝐼 be a set of items and 𝑑 𝑗 be a document. 𝐼𝑑 𝑗 is the set communities) and ranks the documents within a given community of items associated with a document 𝑑 𝑗 and |𝐼𝑑 𝑗 | is the number while considering the topic-document relevance. of those unique items. In the same way, |𝐼𝐷𝑘 | is the number of Using contextual networks & community detection. In gen- unique items associated with 𝐷𝑘 where 𝐷𝑘 is the set of 𝑘 docu- eral, the goal of community detection is to partition any network ments {𝑑 1, 𝑑 2, · · · , 𝑑𝑘 }. into communities to extract the subgroups of densely connected R is the initial -assumed as non empty- list of 𝑁 retrieved docu- nodes [16, 19]. The elements from a given community are then ments {𝑑 1 , 𝑑 2 , · · · , 𝑑 𝑁 } ordered by document score, expressed as considered as very close concerning the cue the edges represent. the probability of the document 𝑑𝑘 at rank 𝑘 to be relevant to the The problem of fair exposure of documents is then cast to the query q (𝑃(𝑑𝑘 |q)). problem of a fair exposure of each “community” of documents. Our S𝑘−1 is the set of documents already selected with fair exposure model thus relies on a network where the nodes of the network to items at (𝑘 − 1)-th iteration and initially empty (S0 = ∅ i.e., for correspond to documents, while the edges between nodes are a k=1). means to contextualize the type of exposure the model should We want to optimize the document selection in an iterative way. give to documents. The edges and thus the document network are 𝑑𝑘∗ is the best fair document to be selected at the 𝑘-th iteration and contextual and depend on the targeted type of document exposure. is calculated as follows: Indeed, documents can be linked in different ways, depending on the types of documents and metadata which are associated with them. 𝑑𝑘∗ = argmax 𝛾 𝑃 (𝑑𝑖 |𝑞) + (1 − 𝛾) 𝐸 (𝑑𝑖 , S𝑘−1 ) (2) The principle is also very close to the linked data principle. “The 𝑑𝑖 ∈R\S𝑘−1 term Linked Data refers to a set of best practices for publishing and where 𝛾 is a combining parameter and 𝛾 ∈ [0, 1], R \ S𝑘−1 is the connecting structured data on the Web” [3]. As an example, linking list of documents from R that has not been selected yet in S𝑘−1 at documents according to the shared countries their authors belong to the (𝑘 − 1)-th iteration. is a way to structure the document space and extract communities Then, S𝑘 = S𝑘−1 ∪ 𝑑𝑘∗  of documents that are authored by people from the same region of (3) the world. As another example, linking the documents according The function 𝐸 (𝑑𝑖 , 𝑆𝑘−1 ) measures the exposure of items of the to the venue where they have been published is another way to document 𝑑𝑖 given the document set 𝑆𝑘−1 and is defined as follows: structure the document space. This document space representation 𝐼𝑑𝑖 ∪ 𝐼𝑆𝑘−1 has several advantages: 𝐸 (𝑑𝑖 , 𝑆𝑘−1 ) = (4) |𝐼 R | • First, it is very adaptive: the definition of the edges can • Fair exposure of communities: depend on the type of documents or the type of exposure Let C be a set of communities detected from the graph of docu- one wants to reveal; ments. Each community 𝑐 ∈ C contains a set of documents 𝐷𝑐 . Like • Second, different types of linking can be easily combined: it previously, 𝐼𝐷𝑐 is the set of items in 𝐷𝑐 and |𝐼𝐷𝑐 | is the number of is possible to link documents both regarding the countries those items. the authors belong to and at the same time, the venue where For fair exposure of communities, we want to optimize the docu- they have published; ment selection over the communities C. We define this selection in • Third, well-known community detection algorithms can be an iterative way. 𝑑𝑘∗ is the document to be selected for fair exposure easily applied; some are known to result in smaller but denser of communities at the 𝑘-th iteration and is calculated as a trade-off communities, while others yield larger communities [11, 31]; between document relevance and exposure as follows: • Finally, the communities can be adaptive and can be defined on a document type-based, user-based or query-based man- 𝑑𝑘∗ = argmax 𝛾 𝑃 (𝑑𝑖 |𝑞) + (1 − 𝛾) 𝐸 (𝑑𝑖 , S𝑘−1 |C) (5) ner to fit better the objectives or definition of the fairness of 𝑑𝑖 ∈R\S𝑘−1 Adrian-Gabriel Chifu, Josiane Mothe, and Md Zia Ullah Like in Equation 2, 𝑃 (𝑑𝑖 |𝑞) is the probability of the document Docs Authors Entities JN 𝑑𝑖 to be relevant for 𝑞. The function 𝐸 (𝑑𝑖 , 𝑆𝑘−1 |C) measures the Query # # AVG # AVG # exposure of items of the document 𝑑𝑖 in the document set 𝑆𝑘−1 Q38944 12 64 5.58 122 14 7 considering the set of communities C and is defined as follows: Q5842 13 56 4.3 98 10.5 9 Õ Table 1: Examples of Fair Ranking track queries along with 𝐸 (𝑑𝑖 , S𝑘−1 |C) = 𝑃 (𝑑𝑖 , S𝑘−1 |𝑐) 𝑃 (𝑐) (6) some statistics. # stands for “number of different”, AVG 𝑐 ∈𝐶 stands for “average number of ... per document”. Docs cor- where 𝑃 (𝑐) denotes the probability of the community 𝑐 which is responds to documents, while JN is for journals. considered as uniform ( |𝐶1 | ) of any community 𝑐 ∈ C and can be ignored for ranking objective. In Equation 6, 𝑃 (𝑑𝑖 , S𝑘−1 |𝑐) estimates the exposure of the items of document 𝑑𝑖 given the exposure of the items of already selected documents S𝑘−1 that belong to the community 𝑐 and can be defined as follows: Ö 𝑃 (𝑑𝑖 , S𝑘−1 |𝑐) = 𝑃 (𝑑𝑖 |𝑐) (1 − 𝑃 (𝑑 𝑗 |𝑐)) (7) 𝑑 𝑗 ∈S𝑘−1 |𝐼𝑑 | (a) Q46933 author-based (b) Q46933 venue-based where 𝑃 (𝑑𝑖 |𝑐) = |𝐼 𝑖 | × 1𝑑𝑖 denotes the exposure of the items of 𝐷𝑐 document 𝑑 (|𝐼𝑑𝑖 |) over the items of community 𝑐 (|𝐼𝐷𝑐 |) if document 𝑑 belongs to community 𝑐. 1𝑑𝑖 makes the probability being 0 if the document does not belong to the community and is defined as follows: ( 1 if 𝑑𝑖 ∈ 𝑐, 1𝑑𝑖 = (c) Q46933 year-based 0 otherwise. 4 PRELIMINARY RESULTS TREC Fair Ranking Track and data. In order to provide a pre- liminary evaluation and proof of concept, we used the Fair Ranking data as released by the track organizers. In this shared task, participants are asked to re-rank query-based retrieved document sets in order to optimize both the relevance to (i) Query Q5842 venue-based (ii) Query Q5842 year-based the consumers and the fairness to the producers. The data consists in: Figure 1: Document communities extracted from the re- • the Semantic Scholar (S2) Open Corpus from the Allen Insti- trieved documents for two of the 2019 Fair Ranking Track tute for Artificial Intelligence which consists of 47 of 1GB queries. Document IDs have been shorten for readability (us- data files. Most of the papers consist of the identifier, its DOI, ing the two first and the latest characters of the real IDs. title, abstract, authors along with their IDs, inbound, and outbound citations; • the Query log provided by Semantic Scholar. For each query, (b) venues where the papers were published (c) year of publication. the organizers provide the query ID, the query string, the In (a), two documents are linked if they have at least one author query normalized frequency, as well as a list of documents in common while in (b) (resp. (c)), two documents are linked if to rerank. There are 652 training and 635 evaluation queries they have been published in the same venue (resp. on the same although not all contain a non-empty document list. year). Other types of linking could have been made using the “In" or "Out" links, entities (key-words) for examples. For a fair exposure Preliminary results. In this first experiment, we do not provide of documents associated to Query 38944, considering venues, the exhaustive results on the entire collection but rather some illustra- group "dbf", "c1f", "e7a" and "dde" for example should have a single tive examples. of them being top ranked as the others are redundant regarding To start with, we considered a few queries for which the set of the venue. A fair exposure regarding the time line of the field could associated documents contains at least 10 documents. We present also be considered. two of them to illustrate the potential power of our model. In a similar way, for Query Q5842 (second row in Table 1), we Query 38944 is one of them (detailed in the first row Table 1). obtained the document communities as presented in Figure 1, when For this query, the two first rows in Figure 1 display the document considering (i) the venues where the papers are published and (ii) network and document communities based on (a) co-authorship the years of publication. https://fair-trec.github.io/ While this method does not include yet the final order of doc- http://api.semanticscholar.org/corpus/ (the 2019-01-31 version) uments, it opens new ways of considering diversity or/and fair Fair Exposure of Documents in Information Retrieval: a Community Detection Approach exposure according to various criteria that can be either considered (Santa Clara, CA, USA) (ICTIR ’19). Association for Computing Machinery, New individually or combined. An interesting point of the model is that York, NY, USA, 229–236. https://doi.org/10.1145/3341981.3344215 [18] Ruoyuan Gao and Chirag Shah. 2020. Toward creating a fairer ranking in search the criteria can evolve according to the needs. engine results. Information Processing and Management 57, 1 (2020), 102138. https://doi.org/10.1016/j.ipm.2019.102138 [19] Mariam Haroutunian, Karen Mkhitaryan, and Josiane Mothe. 2018. f-Divergence 5 CONCLUSION measures for evaluation in community detection. In W. on Collaborative Tech- In this paper, we paved the way for a new framework for fair nologies and Data Science in Smart City Applications (CODASSCA). 137–144. [20] Sami Khenissi and Olfa Nasraoui. 2020. Modeling and Counteracting Exposure document exposure. We presented the model.We also present some Bias in Recommender Systems. illustrative examples when considering the TREC Fair Ranking [21] Onur Küçüktunç, Erik Saule, Kamer Kaya, and Ümit V Çatalyürek. 2014. Diversi- fying citation recommendations. ACM Transactions on Intelligent Systems and Track 2019 data. Technology (TIST) 5, 4 (2014), 1–21. In future work, first, we will develop the model in a formal way [22] Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–A and complete the framework so that it includes a total document survey. Knowledge-Based Systems 123 (2017), 154–162. [23] Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014. Fusion helps ordering. We will also evaluate it in a more general way by con- diversification. In Proc. of the 37th international ACM SIGIR conference on Research sidering the entire 2019 TREC Fair Ranking Track the organizers & development in information retrieval. 303–312. provided as well as the track evaluation measures. As it has been [24] Blerina Lika, Kostas Kolomvatsos, and Stathes Hadjiefthymiades. 2014. Facing the cold start problem in recommender systems. Expert Systems with Applications mentioned by the organizers, in the 2019 collection a lot of queries 41, 4 (2014), 2065–2073. have too few retrieved documents; a method such as ours make [25] Weiwen Liu and Robin Burke. 2018. Personalizing Fairness-aware Re-ranking. CoRR abs/1809.02921 (2018). arXiv:1809.02921 http://arxiv.org/abs/1809. more sense when the list of documents is large enough. As a future 02921 work, we will also participate to the track in 2020 with our model. [26] Jonathan Louëdec, Max Chevalier, Josiane Mothe, Aurélien Garivier, and Sébastien Gerchinovitz. 2015. A multiple-play bandit algorithm applied to rec- ommender systems. In The Twenty-Eighth International Flairs Conference. REFERENCES [27] Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015. [1] Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2019. Managing Recommender system application developments: a survey. Decision Support popularity bias in recommender systems with personalized re-ranking. In The Systems 74 (2015), 12–32. Thirty-Second International Flairs Conference. [28] Graham McDonald, Thibaut Thonet, Iadh Ounis, Jean-Michel Renders, and Craig [2] Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Macdonald. [n.d.]. University of Glasgow Terrier Team and Naver Labs Europe Amortizing individual fairness in rankings. In The 41st international acm sigir at TREC 2019 Fair Ranking Track. ([n. d.]). conference on research & development in information retrieval. 405–414. [29] Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and [3] Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The story so Fernando Diaz. 2018. Towards a fair marketplace: Counterfactual evaluation far. In Semantic services, interoperability and web applications: emerging concepts. of the trade-off between relevance, fairness & satisfaction in recommendation 205–227. systems. In Proc. of the 27th acm international conference on information and [4] Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual knowledge management. 2243–2251. web search engine. (1998). [30] Natwar Modani, Deepali Jain, Ujjawal Soni, Gaurav Kumar Gupta, and Palak [5] Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based Agarwal. 2017. Fairness Aware Recommendations on Behance. In Advances reranking for reordering documents and producing summaries. In Proc. of the in Knowledge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing 21st annual international ACM SIGIR conference on Research and development in Cao, Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.). Springer International information retrieval. 335–336. Publishing, Cham, 144–155. [6] Carlos Castillo. 2019. Fairness and transparency in ranking. In ACM SIGIR Forum, [31] Josiane Mothe, Karen Mkhitaryan, and Mariam Haroutunian. 2017. Community Vol. 52. 64–71. detection: Comparison of state of the art algorithms. In 2017 Computer Science [7] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. and Information Technologies (CSIT). 125–129. Fair clustering through fairlets. In Advances in Neural Information Processing [32] M. Newman. 2006. Finding community structure in networks using the eigenvec- Systems. 5029–5037. tors of matrices. Phys. Rev. E 74, 036104 (September 2006). https://doi.org/ [8] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvtiskii. 2019. Ma- 10.1103/PhysRevE.74.036104 troids, matchings, and fairness. In The 22nd International Conference on Artificial [33] P. Pons and M. Latapy. 2005. Computing Communities in Large Networks Intelligence and Statistics. 2212–2220. Using Random Walks. Lecture Notes in Computer Science, Springer 3733 (2005). [9] Charles LA Clarke, Maheedhar Kolla, Gordon V Cormack, Olga Vechtomova, https://doi.org/10.1007/11569596_31 Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity [34] Antonio J Roa-Valverde and Miguel-Angel Sicilia. 2014. A survey of approaches in information retrieval evaluation. In Proc. of the 31st annual international ACM for ranking on the web of data. Information Retrieval 17, 4 (2014), 295–325. SIGIR conference on Research and development in information retrieval. 659–666. [35] M. Rosvall and C. Bergstrom. 2007. Maps of random walks on complex networks [10] A. Clauset, M. Newman, and C. Moore. 2004. Finding community structure reveal community structure. PNAS 105, 4 (July 2007), 1118–1123. https://doi. in very large networks. Phys. Rev. E 70 70, 066111 (December 2004). https: org/10.1073/pnas.0706851105 //doi.org/10.1103/PhysRevE.70.066111 [36] Rodrygo LT Santos, Craig Macdonald, and Iadh Ounis. 2010. Exploiting query re- [11] Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing formulations for web search result diversification. In Proc. of the 19th international community structure identification. Journal of Statistical Mechanics: Theory and conference on World wide web. 881–890. Experiment 2005, 09 (2005), P09008. [37] Rodrygo LT Santos, Craig Macdonald, Iadh Ounis, et al. 2015. Search result [12] Anubrata Das and Matthew Lease. 2019. A Conceptual Framework for Evaluating diversification. Foundations and Trends® in Information Retrieval 9, 1 (2015), Fairness in Search. CoRR abs/1907.09328 (2019). arXiv:1907.09328 http://arxiv. 1–90. org/abs/1907.09328 [38] Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. [13] Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. & Data Mining. 2219–2228. arXiv:cs.IR/2004.13157 [39] Meng Wang, Haopeng Zhang, Fuhuai Liang, Bin Feng, and Di Zhao. [n.d.]. ICT [14] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh at TREC 2019: Fair Ranking Track. ([n. d.]). Venkatasubramanian. 2015. Certifying and removing disparate impact. In pro- [40] Jacek Wasilewski and Neil Hurley. 2018. Intent-Aware Item-Based Collaborative ceedings of the 21th ACM SIGKDD international conference on knowledge discovery Filtering for Personalised Diversification. In Proc. of the 26th Conference on User and data mining. 259–268. Modeling, Adaptation and Personalization. Association for Computing Machinery, [15] Larry Fitzpatrick and Mei Dent. 1997. Automatic Feedback Using Past Queries: 81–89. https://doi.org/10.1145/3209219.3209234 Social Searching?. In The 20th ACM SIGIR Conference. ACM, 306–313. [41] Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Wei Zeng, and Xueqi Cheng. 2017. [16] S. Fortunato. 2010. Community detection in graphs. Physics Reports 486 (2010), Adapting Markov Decision Process for Search Result Diversification. In Proc. of 75–174. the 40th International ACM SIGIR Conference on Research and Development in [17] Ruoyuan Gao and Chirag Shah. 2019. How Fair Can We Go: Detecting the Information Retrieval. Association for Computing Machinery, 535–544. https: Boundaries of Fairness Optimization in Information Retrieval. In Proceedings of //doi.org/10.1145/3077136.3080775 the 2019 ACM SIGIR International Conference on Theory of Information Retrieval