1. Introduction

Team Skeletor at Touché 2021: Argument Retrieval and Visualization for Controversial Questions

Kevin Ros

kjros2@illinois.edu 0

Carl Edwards

Heng Ji

hengji@illinois.edu 0

ChengXiang Zhai

czhai@illinois.edu 0 0 University of Illinois at Urbana-Champaign , 201 N Goodwin Ave, Urbana, Illinois 61801 , U.S.A

Arguments are a critical part of education and political discourse in society, especially since more and more information is available online. In order to access this information, argument retrieval is a necessary task. In this work, we leverage the existing techniques of BM25 and BERT-based passage embedding similarity and introduce a new information retrieval technique based on manifold approximation. Evaluation results on the Touché @ CLEF 2021 topics and relevance scores show that the manifold-based approximation helps discover higher-quality arguments. Furthermore, we use these retrieval methods to visualize argument progression for users watching debates. The visualization results show promising directions for future exploration.

eol>information retrieval argument manifold approximation visualization

1. Introduction

Arguments are an important part of education and political discourse in society. As the amount of information and social media use grows on the internet, especially surrounding controversial topics, it is critical to improve access to relevant debates, thereby improving public understanding of divisive topics [ 1, 2 ]. Furthermore, traditional search engines are often limited in their ability to efectively display and update relevant information during a live debate, especially when the debate topics are constantly changing.

This paper attempts to address these concerns by investigating both argument retrieval and visualization. More specifically, we participate in Touché @ CLEF 2021 [ 3, 4 ], which presents two distinct argument retrieval tasks: retrieving arguments for controversial questions and retrieving arguments for comparative questions. We focus on the first task, with the goal of supporting users by retrieving and visualizing relevant arguments and sentences for controversial questions. This argument retrieval task goes beyond traditional information retrieval because the retrieval methods need to capture both relevance and argument strength.

As the basic retrieval models have performed well on this task [ 5 ], in addition to the standard baseline BM25 and BERT embedding-based retrieval we explore a new approach in which we leverage the properties of manifold approximation, which is commonly used for dimensionality reduction [ 6 ], as a pseudo-relevance-feedback reranking approach. The manifold-based reranking approach assumes the highest-ranked initially retrieved arguments are relevant to the controversial question, and computes a directed-edge existence probability from each argument to all other arguments in the corpus.

Our hypothesis is that strong, complete, and relevant arguments will have many other arguments “pointing" to it. That is, these arguments should have many high-probability incoming directed edges. Thus, we rerank the arguments based on the aggregation of their incoming edge probabilities. Furthermore, we build on these retrieval approaches to visualize the topics and trajectory of real-time debates as they progress per word with respect to a reference corpus.

Experiments using the args.me corpus [ 7 ] and the Touché @ CLEF 2021 [ 5 ] topics and relevance scores show that our manifold-based ranking formula improves upon BM25 in argument quality. Additionally, our exploration of visualization techniques using the args.me corpus and a spoken debate shows promise in the direction of debate summarization and augmentation.

2. Related Work

Our retrieval methods are inspired by passage-level evidence, as we treat each argument as a collection of sentences [ 8 ]. We follow the general methods described by SBERT’s retrieval and re-ranking.1 Zhao et al. [ 9 ] use manifold-based text representations of sentences in the biomedical domain to capture the geometric relationships between sentences. Other work also incorporates manifold learning into text representations [ 10, 11, 12 ]. To our knowledge, we are the first to incorporate sentence-level manifold representations into information retrieval.

Regarding conversation augmentation, Lyons et al. investigate leveraging dual-purpose speech, which they define as speech socially appropriate to humans and meaningful to computers [ 13 ]. Their software plays the role of an assistant (recording dates, scheduling events) rather than introducing additional knowledge to the conversations, which is what we aim to do. Boyd et al. propose to augment conversations with prosody information to help users with autism detect atypical prosody [14]. We attempt to introduce similar metadata to the debates (however, in the form of conversational topics) as well as introduce additional arguments directly related to the topics being discussed. Popescu-Belis et al. introduce a speech-based just-in-time retrieval system which uses semantic search [15]. That is, they record and transcribe conversations, and provide relevant documents to the participants of the conversation in real-time. Their search methods are based on keywords previously spoken during the conversation using ASR (automatic speech recognition) [16]. A word is considered a keyword if it is in the ASR transcript and is not a stopword, or if it is in a pre-constructed list. Thus, the search queries are limited to what has already been spoken, and high-level dependencies between previously discussed ideas cannot be leveraged. We believe our visualization approaches better address both of these issues.

There has also been much work in the general field of visualizing information retrieval [ 17], but none of these approaches combine BERT and manifold-based dimensionality reduction to allow for more fine-grained understanding of arguments over time.

1https://www.sbert.net/examples/applications/retrieve_rerank/README.html

3. Argument Retrieval

In the following subsections, we describe our argument retrieval methods and results. Each approach retrieves arguments from the args.me corpus (version 2020-04-01), which consists of 387,740 arguments scraped from various online debate portals [ 7 ]. For each argument entry in the corpus, we only consider the text in the “premise" field. Our methods are primarily evaluated using the topics and relevance scores from Touché @ CLEF 2021, and we also include the scores of our methods on last year’s iteration of the competition for completeness. The relevance scores from last year consist of − 2 (non-argument) or a range from 1 (low relevance, weak argument) to 5 (high relevance, strong argument. This year’s relevance scores use the same range, however, they consist of two separate dimensions: argument relevance and argument quality. There are 50 distinct topics each consisting of a short “title" field and a longer “description" and “narrative" ifelds. For our queries, we only use the “title" field. Some examples of “title" fields include “Do we need sex education in schools?" and “Should stem cell research be expanded?".

3.1. Methods

3.1.1. BM25 For our baseline approach, we use BM25. BM25 is a bag-of-words ranking formula that relies on keyword matching between a query and a collection of arguments, along with various weighting heuristics. To process, index, and search arguments, we use Pyserini, which is a Python-based information retrieval toolkit built over Anserini and Lucene [18]. All argument premises are processed and indexed using the default Pyserini settings. This includes stopword removal and stemming. All queries are also processed similarly. We use Pyserini’s provided BM25 implementation to search the corpus, only adjusting the 1 and parameters. We tune the parameters on last year’s topics and relevance scores.

3.1.2. Semantic Search

Given that BM25 only matches exact terms, we explore the efectiveness of encoder-based nearest neighbor search to help bridge potential vocabulary gaps. To do this, we first split the premises of each argument by sentence into smaller passages of approximately 200 words each. Then, we encode each passage using the msmarco-distilbert-base-v3 encoder model provided by Sentence Transformers [19]. At a high level, msmarco-distilbert-base-v3 is a BERT-based [20] Siamese sentence encoder fine-tuned for question-answering on the MS MARCO data set [ 21]. The passage embeddings are stored and indexed using the hnwslib Python library [22], which provides an approximate nearest-neighbor lookup index using hierarchical navigable small world graphs. Each topic title is also encoded using msmarco-distilbert-base-v3, and given the encoded topic, we search for the approximate top nearest neighbor passages. The top arguments are ordered based on the maximum cosine similarity between the topic and any of its passages. All parameters are again tuned using the previous iteration of the task.

We also investigate combining the scores returned via semantic search with those returned using BM25. To calculate this, we use the following formula:

25 + ×

3.1.3. Manifold Approximation

Our third argument retrieval approach attempts to leverage the techniques utilized in UMAP (Uniform Manifold Approximation and Projection) [ 6 ]. UMAP is a dimensionality reduction technique that first approximates a uniform manifold for each data point and patches together their local fuzzy simplicial set representations, where a simplicial set is a higher-dimensional generalization of a directed graph. Then, this topoligical representation is used to assess and optimize lower-dimensional representations. A full theoretical description of UMAP is beyond the scope of this paper, so we focus solely on the computational aspects of UMAP’s manifold approximation which are relevant to our retrieval approach.

To approximate a uniform manifold for each data point , UMAP first finds the nearest neighbors to . Then, it defines and , where (1) (2) (3) = {(, )|1 ≤ ≤ , (, ) > 0}, =1

∑︁ exp( − (0, (, ) − ) ) = log2(),

and (, ) is the distance between and . Intuitively, is the distance to ’s closest neighbor (in our case, the most similar passage) and smooths and normalizes the distances to the nearest neighbors. Next, UMAP calculates the following weights between data points: ((, )) = exp( − (0, (, ) − ) ).

Calculating this for every data point results in a -granularity weighted adjacency matrix between all points in the data. The authors of UMAP note that ((, )), or entry , of the weighted adjacency matrix, can be interpreted as the probability that a directed edge from to exists.

For the purposes of argument retrieval, our hypothesis is that strong, complete, and relevant arguments will have many other arguments “pointing" to it. That is, these arguments should have many high-probability incoming directed edges. Thus, for a given topic title, we first search using the aforementioned interpolated BM25 and semantic retrieval methods. We encode all of the passages for the top arguments. Next, for each encoded passage, we find the nearest neighbors and calculate (1), (2), and (3) as described above. Finally, we score each argument by the sum of all directed edges pointing to the argument.

Note that the sum of these calculated passage weights possess diferent properties than just the sum of the passage similarities. Most notably, Equation 2 constrains the scaled sum of distances to log2(), where k is the number of nearest neighbors. Our understanding is that this calculation gives importance to points that have fewer highly-similar (closer) neighbors. For example, if we have two points (x) and (y), and the (point,distance) pairs of their three nearest neighbors are () : [(, 0.1), (, 0.2), (, 0.9)] () : [(, 0.1), (, 0.2), (, 0.3)] bm25 semantic bm25-0.7semantic

manifold manifold-c10 then the weight between (x) and (b) will be higher than (y) and (e), even though they have the same relative distances. Here are the resulting weights from the manifold calculation ( = 0.179741, = 0.113319): () : [(, 1), (, 0.5733), (, 0.0117)] () : [(, 1), (, 0.4138), (, 0.1712)] Intuitively, this may help reduce importance of passages that are similar to many other passages, as that passage will contribute lower weights to other passages.

3.2. Results

We submitted five runs to Touché 2021, and the performance measures for these five runs are listed in Table 1. The 2021 runs are judged in two dimensions: argument relevance and argument quality, which correspond to the second and third columns of the table, respectively. We also include the performance of our retrieval models on the topics and relevance scores from Touché 2020 as a reference, see column three. All measures are calculated using normalized discounted cumulative gain at five (nDCG@5).

The first run, “bm25", corresponds to the approach outlined in Section 3.1.1. We tune the parameters using grid search and arrived at 1 = 3.2 and = 0.2 using the 2020 topics and relevance scores. The next row, “semantic", corresponds to Section 3.1.2. We set the number of nearest neighbors = 1000 for each topic. Next, “bm25-0.7semantic" denotes the interpolation of the two aforementioned approaches, with an value of 0.7. The final two rows correspond to the approach described in Section 3.1.3. For “manifold", we assume the top 3 arguments from “bm25-0.7semantic" are relevant and search for = 50 nearest neighbors for each argument passage. The retrieved passages are completely reranked by aggregating the weights over each argument. For “manifold-c10", we perform the exact same search, but only rerank the top 10 arguments of the “bm25-0.7semantic" run.

For this year’s evaluations, our best-performing run with respect to relevance is “bm250.7semantic". However, all of our other runs which utilize BM25 (i.e., excluding “semantic") perform similarly. With respect to quality, our best-performing run is “manifold". Here, it is promising that “manifold" outperformed “manifold-c10", as this implies that the manifold technique is able to increase argument quality by retrieving arguments outside of the top 10 initially-ranked arguments.

It is unclear whether or not our initial hypothesis is supported by the scores listed in Table 1. The evaluation metrics from this year seem to support our hypothesis in the context of our “manifold" run, but last year’s results show a decrease in performance. This may be because last year’s relevance scores combine many diferent measures into a single dimension. Furthermore, it is dificult to separate out the efects of BM25 on our manifold-based approaches, since it appears that these approaches perform similarly. This, along with the high scores of our “bm25" run, stresses the importance of well-tuned robust models. Overall, these results are a step in the right direction for our hypothesis, but more analysis is needed to draw firm conclusions.

4. Visualization

While a ranked list of document snippets is often suficient for ordinary web search, such a list is not necessarily optimal for showing results of argument retrieval to the users because it is common to discuss many topics during a debate and the user may want to see the topical structure. These topics may be discussed at length, briefly mentioned, or revisited as the debate unfolds. Traditional search engines, which require explicit user querying, often display relevant documents and arguments in a ranked list, which makes it dificult to efectively capture and visualize these topic changes. For example, it may be too time consuming for a participant in a debate to constantly search for and read all of the relevant documents. Or, someone may want a high-level summary of the debate at various points. Thus, we explore various visualization techniques to help mitigate these concerns. This is accomplished by minimizing the necessity of constant user input as well as visualizing these structural topic changes. Visualization of search results has been studied before [17, 23, 24, 25]; however, existing visualization methods will not work well for our use case, so we explore new approaches.

For our visualization exploration, we utilize the args.me corpus to help summarize and augment debates in real time. We demonstrate our visualization methods on the publiclyavailable debate between Bill Nye and Ken Ham on Evolution vs. Creationism.2 We chose this debate primarily because YouTube provides an accurate transcript of the debate with timestamps, and because of the debate’s diverse topic coverage.

The YouTube transcript timestamps occur approximately every 3 seconds and contain approximately 1 − 8 words per timestamp. We maintain these groupings for our analysis. The text for the transcript referenced in the analysis is in Table 4. The full text of each referenced argument ID is available on GitHub.3

4.1. Visualization Approach with BM25

For any given timestamp , we define a look-back window of size and collect all the terms that occurred between − and . Then, we search the args.me corpus using our BM25 retrieval approach outlined in Section 3.1.1, with the query being the collected transcript terms. We record the ranks of the top arguments returned. We choose BM25 because it is well-known to be robust and eficient. Repeating this over a given interval of timestamps results in a smoothed argument-level summary for the interval.

2https://www.youtube.com/watch?v=z6kgvhG3AkI 3https://github.com/kevinros/toucheRetrievalVisualization/tree/main/arguments description of a bicycle incident understanding scriptures, the gospel, God

creator of universe, infinite power, God having unlimited power, omnipotence of God

the justness of God

As an example, consider the debate time interval 110:53-114:04. Each timestamp and corresponding text is listed in Table 4. We define a look-back window of size = 5 and retrieve the top = 20 arguments for each timestamp. Then, we collect the number of times an argument is ranked in the top 20 arguments across all timestamps, and consider only the five most frequent arguments. Figure 1 displays the ranks of these five argument at each timestamp, and Table 2 lists a high-level description of each argument. The parameters are manually tuned to demonstrate the benefits and drawbacks of this visualization approach.

Of the five arguments returned, S4fde9bb-Aef3913d8 seems to be topically irrelevant to the transcript text. Interestingly, this argument appears to also be a transcript, and thus it contains many filler words (such as “uh") also present in the debate transcript. It appears to be playing the role of a background language model. The other four arguments seem to be relevant as they discuss topics and themes present in the transcript at diferent timestamps. From 111:29 to 111:50, argument S9a8b0a09-A22358c86 is one of the highest-ranked, and it discusses “God", “His kingdom", “scripture", and “His actions". From 112:22 to 112:47, we find that arguments showing the validity of theistic evolution biblical creationism, unfalsifiable

heaven, hell, stars, God physics, star formation, modern science astronomy in the context of the Quran S690aacea-A986a10d7 and S23dda237-A69f9884f are ranked the highest. Both arguments discuss the powers of the creator of the universe. From 112:53 to 113:10, we observe that argument S5059e885-Abe1aa26d is the highest-ranked, which argues in favor of the justness of God.

One use case for this visualization technique is to help participants of the debate better analyze and justify their stance. For example, the participants can draw on the additional knowledge provided by the retrieved arguments to strengthen their own arguments in real-time. On the other hand, it is also possible that rebuttals to participants’ arguments will be retrieved, which could help increase the overall robustness of the debate by exposing counterpoints.

In order to reduce noise and irrelevant arguments, we also explore the possibility of allowing users to specify the search terms or arguments. More specifically, using pre-defined sets of terms, we search the args.me corpus with BM25 to find the most relevant arguments to the provided terms. Then, we display the frequencies of the returned arguments using the methods outlined above, except we consider ranks through 100 rather than 20.

Consider the same debate time interval and the keyword groups “bible god creationism" and “heavens astronomy stars". Figure 2 displays the frequencies of the five most relevant arguments to each keyword group. The first five argument IDs in the legend correspond to the first keyword group, and the second five argument IDs in the legend correspond to the second keyword group. Additionally, high-level descriptions of the arguments that appear in Figure 2 are listed in Table 3. The first two arguments are from “bible god creationism" and the last three arguments correspond to “heavens astronomy stars". From Figure 2, we see that arguments relevant to both keyword groups are highly ranked between 112:10 and 112:36, indicating that the keywords in the retrieved arguments strongly match the keywords from the debate transcript in the time interval.

An important benefit of this visualization technique is that it allows the user to specify specific topics before, during, or after a debate in order to easily track various topic occurrences for further analysis. For example, a user looking to get a high-level summary of a debate can examine the ranking frequencies of known arguments in order to pinpoint the most relevant points in the debate.

As this visualization approach provides a high-level overview of a debate by referencing relevant arguments using keywords, it abstracts away from the actual content of the debate and relevant sentences within arguments. To help address this issue, we explore a more fine-grained visualization approach in the following subsection.

4.2. Visualization Approach with UMAP

The advent of new Transformer-based language models such as BERT [20] have lead to impressive improvement on a variety of NLP tasks. We seek to use BERT’s semantic representation space to better visualize the dynamics of arguments. To do so, we take advantage of UMAP [ 6 ]. The goal of UMAP is to visualize high-dimensional embeddings in a low-dimensional space while preserving topological and structural properties. Using the same BERT-based encoder discussed in Section 3.1.2, we combine the encodings of the sentences of relevant arguments and the “caterpillar embeddings" of our debate transcript to visualize how the debate evolves over time. This approach allows us to analyze fine-grained topic changes as they unfold in the debate, as well as their relevance to a reference corpus.

4.2.1. Caterpillar Embeddings

Caterpillar embeddings are used to track the course of the debate over time. They consist of a sequence of encoder representations taken from across the debate. A naïve approach is to slide a window of size over the sequence of words in the transcript with stride . However, this has the downside of both adding and removing information (words) at each step. Instead, we split each step into two: a growth step and a contraction step. Given a window from word to + of the transcript for some , the next window will grow to be from to ++. The subsequent window will be a contraction: it will range from + to ++ Hence, this “caterpillar embedding" technique moves along the transcript of the debate like a caterpillar inching along. At step , the start and end of the window, and respectively, are calculated

4.2.2. Argument Retrieval-Based Semantic Visualizations

In order to better define the topology of the semantic space, we extract the top most frequent arguments over the transcript interval 110:53 to 127:01 as described in Section 4.1 from the args.me corpus, split them into sentences, and encode the sentences using the previouslymentioned BERT-based sentence encoder. We combine these argument embeddings with the caterpillar embeddings of the debate transcript and project them into two dimensions using UMAP. This creates a path of the debate as it visits diferent arguments in the semantic space. We can then use the nearby neighbors of the caterpillar embeddings as relevant arguments to show the user at a given timestamp. The full animation can be found on GitHub.4

Regardless of which value we use, we find that this UMAP projection does not preserve the original space well regarding nearest neighbors. We believe this is because of the large diferences between the semantic structures of the conversational YouTube debate and the written structures of the corpus debates. To mitigate this, we use a nearest neighbor search in the original space, and we plot the debate embedding using its nearest neighbors. Through empirical exploration, we find that = 100 and = 100 yields the clearest results. Additionally, we consider the same window as explored in Section 4.1, namely 110:53 to 114:04. Note that the transcript of the debate in this window is available in Table 4. The resulting path at various timestamps is shown in Figure 3.

The argument quickly moves to the lower left quadrant, which we find to signify the creation of the universe and heavens, particularly in relation to God. The path briefly moves to the right, when the debate focuses more on the omnipotence and omniscience of God. Finally, the debate moves upward, when the discussion changes to physics, life science, and astronomy. The full video can also be found on GitHub.5

In Figure 3, we clearly see groupings of arguments’ topics and how they change over time. Interestingly, we can also examine the topic path through the corpus that the YouTube debate took. This could be used to track debate topic progression in a visual manner, and augment live debates with both relevant information at the current point as well as relevant information for future, forecasted points. More work is needed, however, to investigate the efects of parameter selection and the efectiveness in various domains.

5. Conclusion and Future Work

In this work, we apply several techniques to the Touché Argument Retrieval task, such as BM25, semantic search, and manifold-based reranking. Among them, we find that the manifold-based reranking was sometimes more efective in returning high-quality arguments when compared to BM25. In the future, we hope to compute the manifold weights for every argument in the data set as a preprocessing step, and investigate eficient ways to combine these weights with 4https://github.com/kevinros/toucheRetrievalVisualization/blob/main/animations/full_anim.mp4 5https://github.com/kevinros/toucheRetrievalVisualization/blob/main/animations/100top_mean_anim.mp4 (a) Initial frame. Time: 111:10 (c) Time: 113:27 (d) Final frame. Time: 114:04 retrieval methods that perform well along the relevance dimension, in order to return the strongest and the most relevant arguments.

To better display search results to users in argument retrieval, we also introduce various visualization techniques based on BM25 keyword matching and UMAP dimensionality reduction, which shows promise in the direction of debate augmentation. Although the benefits of this augmentation are dificult to quantify, we believe it will help improve debate understanding and retention, as well as open up avenues for future work. We also hope to improve the visualization by further testing diferent parameters, retrieval techniques, and background corpora. conversations using dual-purpose speech, in: Proceedings of the 17th annual ACM symposium on User Interface Software and Technology, 2004, pp. 237–246. [14] L. E. Boyd, A. Rangel, H. Tomimbang, A. Conejo-Toledo, K. Patel, M. Tentori, G. R. Hayes, Saywat: Augmenting face-to-face conversations for adults with autism, in: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 4872–4883. [15] A. Popescu-Belis, M. Yazdani, A. Nanchen, P. N. Garner, A speech-based just-in-time retrieval system using semantic search, Technical Report, Idiap, 2011. [16] P. N. Garner, J. Dines, T. Hain, A. El Hannani, M. Karafiat, D. Korchagin, M. Lincoln, V. Wan,

L. Zhang, Real-time ASR from meetings, Technical Report, Idiap, 2009. [17] M. Hearst, Search user interfaces, Cambridge university press, 2009. [18] J. Lin, X. Ma, S.-C. Lin, J.-H. Yang, R. Pradeep, R. Nogueira, Pyserini: An easy-to-use python toolkit to support replicable ir research with sparse and dense representations, arXiv preprint arXiv:2102.10073 (2021). [19] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv preprint arXiv:1908.10084 (2019). [20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [21] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, Ms marco: A human generated machine reading comprehension dataset, in: CoCo@ NIPS, 2016. [22] Y. A. Malkov, D. A. Yashunin, Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs, IEEE transactions on pattern analysis and machine intelligence 42 (2018) 824–836. [23] S. Liu, X. Wang, C. Collins, W. Dou, F. Ouyang, M. El-Assady, L. Jiang, D. A. Keim, Bridging text visualization and mining: A task-driven survey, IEEE transactions on visualization and computer graphics 25 (2018) 2482–2504. [24] A. M. MacEachren, A. Jaiswal, A. C. Robinson, S. Pezanowski, A. Savelyev, P. Mitra, X. Zhang, J. Blanford, Senseplace2: Geotwitter analytics support for situational awareness, in: 2011 IEEE conference on visual analytics science and technology (VAST), IEEE, 2011, pp. 181–190. [25] J. Peltonen, K. Belorustceva, T. Ruotsalo, Topic-relevance map: Visualization for improving search result comprehension, in: Proceedings of the 22nd international conference on intelligent user interfaces, 2017, pp. 611–622.

[1]

Perrin , Social media usage, Pew research center 125 ( 2015 ) 52 - 68 .

[2]

Del Vicario ,

Vivaldo ,

Bessi ,

Zollo ,

Scala , G. Caldarelli, W. Quattrociocchi, Echo chambers: Emotional contagion and group polarization on facebook, Scientific reports 6 ( 2016 ) 1 - 12 .

[3]

Bondarenko ,

Gienapp ,

Fröbe ,

Beloucif ,

Ajjour ,

Panchenko ,

Biemann ,

Stein ,

Wachsmuth ,

Potthast ,

Hagen , Overview of Touché 2021: Argument Retrieval , in: D. Hiemstra , M.-F. Moens , J.

Mothe , R.

Perego , M.

Potthast , F.

Sebastiani (Eds.), Advances in Information Retrieval. 43rd European Conference on IR Research (ECIR 2021 ), volume 12036 of Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2021 , pp. 574 - 582 . URL: https://urldefense. com/v3/__https://link.springer.com/chapter/10.1007/978-3- 030 -72240-1_ 67 __;!! DZ3fjg! qiIStvQ7N0tMq0XWzNrBDwdUszdG_1Cm5f0npcVKkP9lL7BwqrITiN5eveoZNiWt_Q$ . doi:10 .1007/978-3- 030 -72240-1\_ 67 .

[4]

Bondarenko ,

Gienapp ,

Fröbe ,

Beloucif ,

Ajjour ,

Panchenko ,

Biemann ,

Stein ,

Wachsmuth ,

Potthast ,

Hagen , Overview of Touché 2021: Argument Retrieval , in: G. Faggioli,

Ferro ,

Joly ,

Maistro ,

Piroi (Eds.), Working Notes Papers of the CLEF 2021 Evaluation Labs, CEUR Workshop Proceedings , 2021 .

[5]

Bondarenko ,

Fröbe ,

Beloucif ,

Gienapp ,

Ajjour ,

Panchenko ,

Biemann ,

Stein ,

Wachsmuth ,

Potthast ,

Hagen , Overview of Touché 2020: Argument Retrieval , in: L. Cappellato , C.

Eickhof , N.

Ferro , A . Névéol (Eds.), Working Notes Papers of the CLEF 2020 Evaluation Labs , volume 2696 of CEUR Workshop Proceedings , 2020 . URL: https://urldefense.com/v3/__http://ceur-ws. org/ Vol- 2696 /__;!!DZ3fjg! qiIStvQ7N0tMq0XWzNrBDwdUszdG_ 1Cm5f0npcVKkP9lL7BwqrITiN5ever8RPesww $.

[6]

McInnes ,

Healy ,

Melville , Umap: Uniform manifold approximation and projection for dimension reduction , arXiv preprint arXiv: 1802 . 03426 ( 2018 ).

[7]

Ajjour ,

Wachsmuth ,

Kiesel ,

Potthast ,

Hagen ,

Stein , Data Acquisition for Argument Search: The args .me corpus, in: C. Benzmüller, H. Stuckenschmidt (Eds.), 42nd German Conference on Artificial Intelligence (KI 2019 ), Springer, Berlin Heidelberg New York, 2019 , pp. 48 - 59 . doi: 10 .1007/978-3- 030 -30179-8\_4.

[8]

J. P.

Callan , Passage-level evidence in document retrieval , in: SIGIR'94 , Springer, 1994 , pp. 302 - 310 .

[9]

Zhao ,

Wang ,

Lin ,

Chu ,

Wang ,

Zhang ,

Yang , Sentence representation with manifold learning for biomedical texts , Knowledge-Based Systems 218 ( 2021 ) 106869 .

[10] T. B. Hashimoto , D.

Alvarez-Melis , T. S.

Jaakkola , Word embeddings as metric recovery in semantic spaces, Transactions of the Association for Computational Linguistics 4 ( 2016 ) 273 - 286 .

[11]

Hasan , E. Curry, Word re-embedding via manifold dimensionality retention, Association for Computational Linguistics (ACL ), 2017 .

[12]

Jiang ,

Li ,

Chen ,

A. G.

Cohn , Latent topic text representation learning on statistical manifolds , IEEE transactions on neural networks and learning systems 29 ( 2018 ) 5643 - 5654 .

[13]

Lyons ,

Skeels ,

Starner ,

C. M.

Snoeck ,

B. A.

Wong ,

Ashbrook , Augmenting