1. Introduction

M. M. Islam, D. Wei, B. Schieber, S. B. Roy, Satis- E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, fying complex top-k fairness constraints by prefer- C. Berner, S. McCandlish, A. Radford, I. Sutskever, ence substitutions, Proc. VLDB Endow.

10.1145/3514221.3517865

Diversification of Top-k LLM Results using Database Queries

Thinh On

0 1 2 3

Subhodeep Ghosh

0 1 2 3

Mengnan Du

mengnan.du@njit.edu 0 1 2 3

Senjuti Basu Roy

senjutib@njit.edu 0 1 2 3 0 ChatGPT Response. Certainly! Here's a list of top 10 1 New Jersey Institute of Technology , USA 2 Fantasy: ”The Lord of the Rings: The Fellowship of 3 Romance: ”Eternal Sunshine of the Spotless Mind”

2021

16 2022 2005 2013

Result diversification aims to return relevant results that cover a variety of perspectives. Attribute-based diversification groups results by shared attributes (e.g., genre for movies) and selects a proportional number of items from each group based on their distribution in the underlying data. However, large language models (LLMs) are not designed to produce proportionally diverse results. In this work, we propose leveraging external data sources to determine the distribution of groups related to a query and prompt LLMs to produce proportionally diverse results. This can improve result diversity by representing groups in proportion to their prevalence. Specifically, we first argue the benefits of making top-k results from LLMs proportionally diverse. We then show how to use external benchmark databases to enable proportional diversity. Finally, we outline a framework that prompts LLMs with proportionality information from external data and discuss challenges in automating this process. Our approach provides a path to overcoming LLMs' limitations in producing proportionally diverse responses.

top-k Diversification large language models (LLMs) prompting LLMs querying database

1. Introduction

The goal of result diversification is to return results that are relevant as well as cover diferent intents. In the data management community, returning top-k diverse results of a query has been extensively studied, and there exists many seminal works that propose objective functions and eficient algorithms to return results that are diverse and representative [ 1, 2, 3 ]. In this work, we study proportional diversity in top- , which amounts to grouping items in the result set based on the query condition that share common attribute values (e.g., different genres for movies) and selecting only a limited number of items from each group that represent their proportional distribution. Large language models (LLMs), such as ChatGPT [ 4 ], Claude [ 5 ], Alpaca [ 6 ], Vicuna [ 7 ], etc, on the other hand, do not return results that are proportionally diverse. In this work, we first argue the benefits of making top- results from LLMs proportionally diverse. We then present how to leverage external benchmark databases to enable proportional diversity. Finally, we outline a computational framework that prompts LLMs with proportionality information from external data sources and discuss research challenges to automate such an outcome.

As a concrete example, a leading open source LLM

Bases (VLDBW’23) — Workshop on LLMs and Databases (LLMDB’23), the Ring” (2001) - The first installment of the epic fantasy trilogy that follows a group of heroes on a quest to save Middle-earth from darkness. Actor actor_id name 853453 Orson Welles 853454 Christian Bale 853455 Janet Leigh

Gender M M F

Birth country US UK US

Production Company Prod company_id Name Owner 863453 RKO pictures David Sarnoff 886633445545 WPaaprrinacmeturorBeusrnots PWAicmtaNuruarnesteiesormGnBearrnolotuss.p

Database

Headquarter

NYC CA CA

SQL

Award

Award id 123 124 179

AAAcnwawadmaaerrmeddy catAegwo3arryd_id GGollodbeen 4 AEwdgaardr 1

Award Details

Category_id 3 4 1

Category name Original screen

play Best supporting

actor Best motion picture

• Thriller: ”Psycho” (1960) - A classic suspenseful of 25,000 Comedy movies have similar high ratings, reiflm by Alfred Hitchcock that explores the chilling spectively. In proportion, therefore, it is desirable to see story of a motel owner with a dark secret. 5 Drama, 3 Action, and 2 Comedy movies in the top-10 • Animated: ”Toy Story” (1995) - The groundbreaking results, which represents the preference of the IMDB animated film that introduced audiences to a world users more appropriately in the returned LLM answers. where toys come to life, combining humor, heart, Such idea of proportionate representation is explored in and adventure. multiple recent works [ 8, 9 ] and bears close connection • Mystery: ”Gone Girl” (2014) - A gripping thriller to make the results diverse and fair.

that keeps audiences on the edge of their seats as a man becomes the prime suspect in his wife’s disap- Problem Definition 1. Proportionally diversified pearance. top-k Results. Given a user query , integer , user speci• Historical: ”Schindler’s List” (1993) - A poignant ifed attribute with ℓ domain size (on which the results and harrowing portrayal of the Holocaust, based are to be diversified), a proportionality constraint defined on the true story of Oskar Schindler, a German over a single attribute containing ℓ diferent groups 1, lbiuvesisn.essman who saved over a thousand Jewish is2,. , toℓ,pr-eqrueisruelstst,hwathtehree ∑repℓr es=enta. tiGonenoefraelaizchinggrtohuips, i f proportionality is defined over a set of diferent attributes These represent a diverse range of genres and have with a required representation on each group of each atachieved critical acclaim for their storytelling, perfor- tribute, a proportional top- result must simultaneously mances, and impact on popular culture. It is easy to infer satisfy proportionate representation for all attributes in . that the LLM as is returns one movie per genre (i.e., the proportional distribution of movies per genre is uniform).

However, if an external data source is looked at (such 2. Proposed Framework as, IMDB database), it could be seen that the top-rated The proposed framework is presented in Figure 1, which movies (average rating 6.5 for example) exhibit diferent is motivated by ChatDB [10]. The user writes a query to proportional distributions per genre. the LLM, the LLM connects with external databases to

As a toy example, as shown in Figure 1, 10,000 out of retrieve count information, the produced query results 20,000 Drama movies have average ratings higher than are to converted in natural language texts to prompt 6.5, whereas, 5,000 out of 15,000 Action, and 5,000 out LLM, LLM produces final results and summarizes it. The development of the framework thus requires solving the following four fundamental tasks. challenge is to finetune LLMs so that it could summarize back to the user the reasoning behind the returned results.

In the context of the running example, it may say “5 out of 10 movies are drama, because 50% drama movies are very highly rated,...” A. Convert user query to LLMs to a series of SQL queries. Given the user query (e.g., find top-10 movies based on genre), an external data source (e.g., IMDB) is looked at and a series of SQL queries are submitted. Using the running example, the first query (step 1) retrieves 3. Preliminary Results diferent movie genres and their respective counts that are present in the database. The second, third, and fourth In this section, we evaluate the efectiveness of the proqueries (corresponding to steps 2,3, and 4, respectively) posed framework as a proof of concept, demonstrating query for each of the retrieved genre (in this example, its ability to prompt LLMs to generate answers with prothese are, drama, action, and comedy) and finds the count portional diversity by linking to an external database. of each kind which has average rating higher than some We first translate user intents into SQL statements to threshold (in this case > 6.5). In general, the goal is query information and calculate proportional statistics to make a sequence of SQL queries that are required from the database. We then prompt the models with the to produce count information of the groups of interests proportional information to produce diversified answers. based on the query. The Text-to-SQL solutions [11, 12, For instance, a request for ”10 diverse movies by genre” 13] could be leveraged for that. Even though in this requires counting movies of each genre in the database, preliminary work, we manually generate those queries. computing the proportion for each genre, and normalizB. Compute proportion based on arbitrary query ing the proportions to the number of movies requested condition. The next task is to produce proportion of (10 in this example). This process allows for proportional each group in top- based on the count information re- representation of genres in the generated movie list. trieved from the query engine. For that, we leverage our recent research results [ 8, 14 ] that studies compu- 3.1. IMDb Dataset tational challenges of computing proportional representation. When the query is defined on a single attribute We use the IMDb dataset by Kaggle 1 for a case study with ℓ diferent domains (e.g., using the running exam- of our framework. This dataset is a subset of a larger ple, ℓ = 3). Using the running example, as shown in IMDb dataset which spans a comprehensive collection Figure 1, 50% (10,000 out of 20,000) Drama movies have of movies over several decades. The dataset is formatted average ratings higher than 6.5, whereas, 33% (5,000 out as a relational database with 3 tables, namely movieList, of 15,000) Action, and 20% (5,000 out of 25,000) Comedy ratings, and regions which encompass movies in 25 difermovies have similar high ratings, respectively. Therefore, ent genres across 70 languages. We first convert source = .5+.3.53+.2 = 5, whereas, = 3, = 2. data files to comma-separated value (CSV) format which Indeed, it could be seen that the external data-source has is readable by SQL. Next, the CSV files are imported to a high proportion of highly rated drama movies compared MySQL by removing delimiters between values/records to comedy movies, even though there are more comedy (e.g., comma, \n) while maintaining the original table movies that drama movies in the results. These numbers schema (see Table 1 for the database schema). The strucindicate that in the returned results drama should have ture of the dataset allows users to query and filter the higher representation than comedies. However, com- data based on specific research questions, which fits the puting proportion for arbitrarily complex condition is goals of our case study. non-trivial. In [14], we prove that it is NP-hard just to decide whether there exists a feasible solution that sat- 3.2. Implementation Details isfies the proportionality requirement defined over 3 or more attributes.

To simplify the task that requires various tools, we unify

MySQL and GPT-3.5-Turbo Model API developed by OpeC. SQL to Text Transform. The next step of the process nAI 2 into a single Python interface. First, we link the is to consume the proportion generated by the SQL results IMDb database to MySQL and establish connection beand convert them in natural language like texts that LLMs tween MySQL and Python using pymysql library. As a reunderstand. Using the running example, this is equiva- sult, SQL statements could be written directly in Python lent to prompting the LLMS to return “ 5 drama movies, interface and the proportion could be computed from 3 action movies, and 2 comedy movie”. As before, there query outputs. Next, we summarize the proportion inforexists the challenge of automatically translating com- mation into a prompt and feed into the GPT Model API puted proportions to natural language texts. For which, existing SQL-to-Text solutions could be used [15, 16].

D. Finetune LLMs to Summarize Results. The final

1https://www.kaggle.com/datasets/ashirwadsangwan/imdb-dataset 2https://platform.openai.com/docs/models/gpt-3-5 Major attributes Description

tconst titleType primaryTitle originalTitle genres tconst averageRating numVotes titleId title region language alphanumeric unique identifier of the title type/format of the title (e.g., movie, tvseries, video, etc.) the more popular title / the title used by the filmmakers on promotional materials at the point of release includes up to three genres associated with the title La Haine (1995, French) alphanumeric unique identifier of the title weighted average of all the individual user ratings number of votes the title has received a tconst, an alphanumeric unique identifier of the title the localized title the region for this version of the title the language of the title 3.3. Result Analysis

4. Open Problems In this section, we discuss research challenges regarding

achieving proportionally diverse top-k results from LLMs, including automatically transform user queries into SQL queries, automatically transform SQL results into LLM prompts, and finetuning LLMs.

There are several key challenges in automatically trans

forming SQL queries and results into natural language prompts for LLMs. First, it is challenging to map numerical values, proportions, and counts retrieved from a database into appropriate quantifiers and aggregates in

Answers to summarized prompts

The Dark Knight (2008, English) Pulp Fiction (1994, English) Amelie (2001, French) La Haine (1995, French) Blue is the Warmest Color (2013, French) Life is Beautiful (1997, Italian) Cinema Paradiso (1988, Italian) La Dolce Vita (1960, Italian) The Great Beauty (2013, Italian) The Conformist (1970, Italian)

2 English : 3 French : 5 Italian

to be highly ranked, while omitting unnecessary details. Third, we must preserve context between the original user query, the intermediate SQL queries and results, and the final LLM prompts and summaries. The end-to-end system needs to retain the attributes, conditions, and constraints specified in the initial user query so that the ifnal LLM prompts and instructions are grounded and relevant. Addressing these challenges will be the focus of our future research. 4.3. Finetuning LLMs The last research challenge lies in finetuning LLMs to generate informative summaries. The LLMs should be ifnetuned to produce summaries that not only present the final results but also explain the reasoning behind those results. To accomplish this, the LLMs need to be trained on a large dataset of query-result pairs, where the results are accompanied by human-generated summaries or explanations. These summaries should capture the key insights and patterns in the results, highlighting the factors that influenced the distribution of groups. However, creating high-quality summaries requires extensive human expertise and efort. A potential approach is to use a combination of human-generated and automatically generated data by other LLMs to balance quality and efifciency. During finetuning, the LLMs learn to generate summaries that are concise, informative, and relevant to the user’s query. The model should understand the statistical information from the SQL queries and use it to construct coherent explanations. For example, in the running example, the LLM might incorporate information about the high ratings of drama movies and the comparative proportions of diferent genres to generate an insightful summary. Lastly, to improve the quality of the generated summaries, various techniques can be employed, such as reinforcement learning. The model can be rewarded based on the informativeness and coherence of its generated summaries. By optimizing these rewards, the LLM can learn to produce personalized, high-quality summaries that explain the results to users. 5. Related Work size, often leads to improved performance on downstream tasks [19] and enhances the model’s ability to solve various complex tasks. With their ever-increasing sizes, popular language models such as PaLM [20], LLaMA [21], Galactica [22], GPT-3 [23], and GPT-4 [24] have achieved state-of-the-art performance on many tasks. Consequently, they have motivated a profound shift in Natural Language Processing (NLP) research towards LLMs. For example, OpenAI released ChatGPT [25] which leverages the GPT-3.5 architecture, capable of understanding languages and engaging in meaningful conversations across various topics. ChatGPT represents the impact of LLMs throughout the community and revolutionizes our understanding of NLP [26, 18, 27]. In addition, these LLMs have made significant progress in natural language processing and have enabled various applications, such as coding assistants, search engines, and dialogue systems.

Augmenting LLMs. Despite the rapid progress of LLMs,

LLMs also sufer from some limitations, including generating implausible predictions (hallucinations), requiring massive scale and data to achieve good performance, and struggling with continual learning [28, 29, 30]. To address these issues, there is a growing research trend of ”augmenting” LLMs by providing them with additional context beyond just their parameters and input tokens. Two representative approaches are: 1) Increasing context relevance by retrieving external info or employing reasoning, which gives useful context with fewer parameters. 2) Allowing LLMs to use external tools and knowledge to augment context, which adds missing information [28]. Recent research has proposed using databases for LLMs which led to enhanced performance in multi-hop reasoning. These works have demonstrated the possibility of improving reasoning capabilities of LLMs by using external memory modules. For example, ChatDB [10] is a recent work that enables the use of real-time databases to enhance multi-step reasoning capabilities of LLMs. The ChatDB framework uses LLMs to transform user inputs into a chain-of-memory (multi-step SQL instructions) that manipulates an external database. The intermediate results are summarized as a prompt which is fed to LLMs to achieve final results.

Text-to-SQL. LLMs have become a reliable source of

In this section, we review several lines of research that generating code, from common programming languages are most closely relevant to our work. such as Python/Java to SQL statements for querying Large Language Models (LLMs). LLMs have become databases [24]. However, parsing natural languages to a prominent area of research that has garnered signif- SQL statements faces semantic and syntactic challenges: icant attention in recent years. LLMs typically refer LLMs as parsers must capture semantics of correct tato Transformer-based models with multi-head attention bles/columns from the database and generate syntacti[17] embedded in deep neural networks and trained on cally valid SQL queries [31]. These requirements pose large-scale corpora [18]. The development of LLMs has significant challenges in designing text-to-SQL models been driven by observations that scaling Pre-trained Lan- to generalize across databases and user intents. To overguage Models (PLMs), either in terms of model or data come these challenges, researchers proposed text-to-SQL frameworks using encoder-decoder neural architectures, ment this approach for improving the diversity of movie categorized into two parsing approaches: single-turn and query results according to genres. The preliminary remulti-turn [32]. In single-turn parsing, the encoder gen- sults show the potential of this framework to mitigate erates embeddings capturing natural language input and large language models’ tendency to return inadequately table schema semantics. The decoder then generates SQL diverse responses. Moving forward, we plan to invesstatements from the encodings [33, 34, 35, 36, 37, 38, 39]. tigate practical solutions to the automate the proposed In multi-turn parsing, the encoder uses diferent encod- framework and evaluate their efectiveness on diverse ing schemes to generate contextual and schema structure query types and domains. embeddings. The decoder, an LSTM model with attention mechanisms [40, 41], generates SQL queries using current and previous hidden states. This enables capturing Acknowledgments long-term input dependencies and generating contextaware SQL queries [42, 43, 44, 45, 46].

Finetuning LLMs. Finetuning involves modifying parameters of a pre-trained model, LLM in our context, using a smaller and task-specific dataset. The goal of ifnetuning LLMs is to enhance pre-trained LLMs using domain adaptation or human feedback, making LLMs more relevant for specific tasks. There are two main streams of finetuning methods for LLMs: instruction tuning and reinforcement learning. Instruction tuning involves supervised learning using instruction-formatted instances, where each instance includes a task description, an input-output pair, and optional demonstrations of the task [47, 48, 22, 49]. For example, if the task is text-to-SQL, the task description could be ”translate to SQL statements” and an input-output pair includes a natural language sentence as input and equivalent SQL statements as output. The formatted instances can be constructed from formatting existing datasets [50, 51] or formatting human needs from real user queries [52] or semi-automated augmentation approaches which feed existing instances into LLMs to generate new task descriptions and instances [53, 54, 55]. Reinforcement learning methods, on the other hand, propose using human feedback to make outputs of LLMs align with human expectations [52, 55, 56]. Although alignment considers human preferences to mitigate unexpected behaviors of LLMs (e.g., hallucinations, misleading or biased answers), low-quality feedback data may pose negative efects on the general abilities of LLMs [57].

6. Conclusion

In this work, we present our initial directions on how to make query results coming from LLMs more representative to what external gold standard data sources may provide. To that end, we present a framework that queries external benchmark databases to determine the proportional distribution of relevant attributes based on the given query. This proportion information is then used to prompt the large language model to return results that match that distribution and cover a variety of relevant perspectives. As a proof of concept, we imple

The work of Senjuti Basu Roy and Thinh On are sup

ported by the National Science Foundation (CAREER Award #1942913, IIS #2007935, IIS #1814595) and the Ofice of Naval Research (Grants No, N000141812838, N000142112966). Y. Wu, S. M. Xie, M. Yasunaga, J. You, M. Zaharia, domain database with intermediate representation, M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng, in: Proceedings of the 57th Annual Meeting of the K. Zhou, P. Liang, On the opportunities and risks Association for Computational Linguistics, Associof foundation models, 2022. arXiv:2108.07258. ation for Computational Linguistics, Florence, Italy, [27] C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, 2019, pp. 4524–4535. URL: https://aclanthology.org/ C. Ji, Q. Yan, L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, P19-1444. doi:10.18653/v1/P19- 1444. C. Xiong, J. Pei, P. S. Yu, L. Sun, A comprehensive [37] W. Hwang, J. Yim, S. Park, M. Seo, A comprehensurvey on pretrained foundation models: A history sive exploration on wikisql with table-aware word from bert to chatgpt, 2023. arXiv:2302.09419. contextualization, 2019. arXiv:1902.01069. [28] G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, [38] W. Lei, W. Wang, Z. Ma, T. Gan, W. Lu, M.R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, Y. Kan, T.-S. Chua, Re-examining the role of J. Dwivedi-Yu, A. Celikyilmaz, et al., Augmented schema linking in text-to-SQL, in: Proceedlanguage models: a survey, arXiv preprint ings of the 2020 Conference on Empirical MetharXiv:2302.07842 (2023). ods in Natural Language Processing (EMNLP), As[29] B. Peng, M. Galley, P. He, H. Cheng, Y. Xie, Y. Hu, sociation for Computational Linguistics, Online, Q. Huang, L. Liden, Z. Yu, W. Chen, et al., Check 2020, pp. 6943–6954. URL: https://aclanthology. your facts and try again: Improving large language org/2020.emnlp-main.564. doi:10.18653/v1/2020. models with external knowledge and automated emnlp- main.564.

feedback, arXiv preprint arXiv:2302.12813 (2023). [39] D. Choi, M. C. Shin, E. Kim, D. R. Shin, RYAN[30] B. Xu, Z. Peng, B. Lei, S. Mukherjee, Y. Liu, D. Xu, SQL: Recursively applying sketch-based slot fillings Rewoo: Decoupling reasoning from observations for complex text-to-SQL in cross-domain databases, for eficient augmented language models, arXiv Computational Linguistics 47 (2021) 309–332. URL: preprint arXiv:2305.18323 (2023). https://aclanthology.org/2021.cl-2.12. doi:10.1162/ [31] P. Glenn, P. P. Dakle, P. Raghavan, Correcting se- coli_a_00403.

mantic parses with natural language through dy- [40] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: namic schema encoding, 2023. arXiv:2305.19974. Pre-training of deep bidirectional transformers for [32] B. Qin, B. Hui, L. Wang, M. Yang, J. Li, B. Li, R. Geng, language understanding, 2019. arXiv:1810.04805.

R. Cao, J. Sun, L. Si, F. Huang, Y. Li, A survey on [41] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, text-to-sql parsing: Concepts, methods, and future O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, directions, 2022. arXiv:2208.13629. Roberta: A robustly optimized bert pretraining ap[33] V. Zhong, C. Xiong, R. Socher, Seq2SQL: Gener- proach, 2019. arXiv:1907.11692. ating structured queries from natural language [42] Q. Liu, B. Chen, J. Guo, J.-G. Lou, B. Zhou, D. Zhang, using reinforcement learning, 2018. URL: https: How far are we from efective context modeling? an //openreview.net/forum?id=Syx6bz-Ab. exploratory study on semantic parsing in context, [34] T. Yu, Z. Li, Z. Zhang, R. Zhang, D. Radev, Type- 2020. arXiv:2002.00652.

SQL: Knowledge-based type-aware neural text-to- [43] R. Zhang, T. Yu, H. Y. Er, S. Shim, E. Xue, X. V. Lin, SQL generation, in: Proceedings of the 2018 Con- T. Shi, C. Xiong, R. Socher, D. Radev, Editing-based ference of the North American Chapter of the sql query generation for cross-domain contextAssociation for Computational Linguistics: Hu- dependent questions, 2019. arXiv:1909.00786. man Language Technologies, Volume 2 (Short Pa- [44] P. Jain, M. Lapata, Memory-based semanpers), Association for Computational Linguistics, tic parsing, Transactions of the AssociaNew Orleans, Louisiana, 2018, pp. 588–594. URL: tion for Computational Linguistics 9 (2021) https://aclanthology.org/N18-2093. doi:10.18653/ 1197–1212. URL: https://aclanthology.org/2021. v1/N18- 2093. tacl-1.71. doi:10.1162/tacl_a_00422. [35] T. Yu, M. Yasunaga, K. Yang, R. Zhang, D. Wang, [45] B. Wang, R. Shin, X. Liu, O. Polozov, M. RichardZ. Li, D. Radev, SyntaxSQLNet: Syntax tree net- son, Rat-sql: Relation-aware schema encodworks for complex and cross-domain text-to-SQL ing and linking for text-to-sql parsers, 2021. task, in: Proceedings of the 2018 Conference arXiv:1911.04942. on Empirical Methods in Natural Language Pro- [46] Y. Zheng, H. Wang, B. Dong, X. Wang, C. Li, cessing, Association for Computational Linguis- HIE-SQL: History information enhanced network tics, Brussels, Belgium, 2018, pp. 1653–1663. URL: for context-dependent text-to-SQL semantic parshttps://aclanthology.org/D18-1193. doi:10.18653/ ing, in: Findings of the Association for Comv1/D18- 1193. putational Linguistics: ACL 2022, Association [36] J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J.-G. Lou, T. Liu, for Computational Linguistics, Dublin, Ireland, D. Zhang, Towards complex text-to-SQL in cross- 2022, pp. 2997–3007. URL: https://aclanthology.

[1]

Agrawal ,

Gollapudi ,

Halverson ,

Ieong , Diversifying search results , in: Proceedings of the second ACM international conference on web search and data mining , 2009 , pp. 5 - 14 .

[2]

Yue ,

Guestrin , Linear submodular bandits and their application to diversified retrieval , Advances in Neural Information Processing Systems 24 ( 2011 ).

[3]

Gollapudi ,

Sharma , An axiomatic approach for result diversification , in: Proceedings of the 18th international conference on World wide web , 2009 , pp. 381 - 390 .

[4]

Bubeck ,

Chandrasekaran ,

Eldan ,

Gehrke ,

Horvitz , E. Kamar,

Lee ,

Y. T.

Lee ,

Li ,

Lundberg , et al., Sparks of artificial general intelligence: Early experiments with gpt-4 , arXiv preprint arXiv: 2303 .12712 ( 2023 ).

[5] AnthropicAI, Introducing claude, 2023 . URL: https: //www.anthropic.com/index/introducing-claude.

[6]

Taori , I. Gulrajani,

Zhang ,

Dubois ,

Li ,

Guestrin ,

Liang , T. B. Hashimoto , Alpaca: A strong, replicable instruction-following model , Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/ 2023 /03/13/alpaca. html 3 ( 2023 ) 7 .

[7]

W.-L.

Chiang ,

Li ,

Lin ,

Sheng ,

Wu ,

Zhang , L. Zheng,

Zhuang ,

J. E.

Gonzalez ,

Stoica ,

E. P.

Xing , Vicuna: An opensource chatbot impressing gpt-4 with 90%* chatgpt quality , 2023 . URL: https://lmsys.org/blog/ 2023-03-30-vicuna/.

[8]

Wei , M. M. Islam , B.

Schieber , S. Basu

Roy , Rank aggregation with proportionate fairness , in: Proceedings of the 2022 International Conference on Management of Data, SIGMOD '22 , Association for Computing Machinery, New York, NY, USA, 2022 ,