<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team Skeletor at Touché 2021: Argument Retrieval and Visualization for Controversial Questions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kevin Ros</string-name>
          <email>kjros2@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carl Edwards</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heng Ji</string-name>
          <email>hengji@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ChengXiang Zhai</string-name>
          <email>czhai@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Illinois at Urbana-Champaign</institution>
          ,
          <addr-line>201 N Goodwin Ave, Urbana, Illinois 61801</addr-line>
          ,
          <country country="US">U.S.A</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Arguments are a critical part of education and political discourse in society, especially since more and more information is available online. In order to access this information, argument retrieval is a necessary task. In this work, we leverage the existing techniques of BM25 and BERT-based passage embedding similarity and introduce a new information retrieval technique based on manifold approximation. Evaluation results on the Touché @ CLEF 2021 topics and relevance scores show that the manifold-based approximation helps discover higher-quality arguments. Furthermore, we use these retrieval methods to visualize argument progression for users watching debates. The visualization results show promising directions for future exploration.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;information retrieval</kwd>
        <kwd>argument</kwd>
        <kwd>manifold approximation</kwd>
        <kwd>visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Arguments are an important part of education and political discourse in society. As the amount
of information and social media use grows on the internet, especially surrounding controversial
topics, it is critical to improve access to relevant debates, thereby improving public understanding
of divisive topics [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Furthermore, traditional search engines are often limited in their ability
to efectively display and update relevant information during a live debate, especially when the
debate topics are constantly changing.
      </p>
      <p>
        This paper attempts to address these concerns by investigating both argument retrieval and
visualization. More specifically, we participate in Touché @ CLEF 2021 [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], which presents two
distinct argument retrieval tasks: retrieving arguments for controversial questions and retrieving
arguments for comparative questions. We focus on the first task, with the goal of supporting
users by retrieving and visualizing relevant arguments and sentences for controversial questions.
This argument retrieval task goes beyond traditional information retrieval because the retrieval
methods need to capture both relevance and argument strength.
      </p>
      <p>
        As the basic retrieval models have performed well on this task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in addition to the standard
baseline BM25 and BERT embedding-based retrieval we explore a new approach in which we
leverage the properties of manifold approximation, which is commonly used for
dimensionality reduction [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as a pseudo-relevance-feedback reranking approach. The manifold-based
reranking approach assumes the highest-ranked initially retrieved arguments are relevant to the
controversial question, and computes a directed-edge existence probability from each argument
to all other arguments in the corpus.
      </p>
      <p>Our hypothesis is that strong, complete, and relevant arguments will have many other
arguments “pointing" to it. That is, these arguments should have many high-probability incoming
directed edges. Thus, we rerank the arguments based on the aggregation of their incoming edge
probabilities. Furthermore, we build on these retrieval approaches to visualize the topics and
trajectory of real-time debates as they progress per word with respect to a reference corpus.</p>
      <p>
        Experiments using the args.me corpus [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the Touché @ CLEF 2021 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] topics and
relevance scores show that our manifold-based ranking formula improves upon BM25 in argument
quality. Additionally, our exploration of visualization techniques using the args.me corpus and
a spoken debate shows promise in the direction of debate summarization and augmentation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Our retrieval methods are inspired by passage-level evidence, as we treat each argument as
a collection of sentences [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We follow the general methods described by SBERT’s retrieval
and re-ranking.1 Zhao et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use manifold-based text representations of sentences in the
biomedical domain to capture the geometric relationships between sentences. Other work also
incorporates manifold learning into text representations [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. To our knowledge, we are
the first to incorporate sentence-level manifold representations into information retrieval.
      </p>
      <p>
        Regarding conversation augmentation, Lyons et al. investigate leveraging dual-purpose
speech, which they define as speech socially appropriate to humans and meaningful to
computers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Their software plays the role of an assistant (recording dates, scheduling events) rather
than introducing additional knowledge to the conversations, which is what we aim to do. Boyd
et al. propose to augment conversations with prosody information to help users with autism
detect atypical prosody [14]. We attempt to introduce similar metadata to the debates (however,
in the form of conversational topics) as well as introduce additional arguments directly related
to the topics being discussed. Popescu-Belis et al. introduce a speech-based just-in-time retrieval
system which uses semantic search [15]. That is, they record and transcribe conversations,
and provide relevant documents to the participants of the conversation in real-time. Their
search methods are based on keywords previously spoken during the conversation using ASR
(automatic speech recognition) [16]. A word is considered a keyword if it is in the ASR transcript
and is not a stopword, or if it is in a pre-constructed list. Thus, the search queries are limited
to what has already been spoken, and high-level dependencies between previously discussed
ideas cannot be leveraged. We believe our visualization approaches better address both of these
issues.
      </p>
      <p>There has also been much work in the general field of visualizing information retrieval [ 17],
but none of these approaches combine BERT and manifold-based dimensionality reduction to
allow for more fine-grained understanding of arguments over time.</p>
      <p>1https://www.sbert.net/examples/applications/retrieve_rerank/README.html</p>
    </sec>
    <sec id="sec-3">
      <title>3. Argument Retrieval</title>
      <p>
        In the following subsections, we describe our argument retrieval methods and results. Each
approach retrieves arguments from the args.me corpus (version 2020-04-01), which consists of
387,740 arguments scraped from various online debate portals [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For each argument entry in
the corpus, we only consider the text in the “premise" field. Our methods are primarily evaluated
using the topics and relevance scores from Touché @ CLEF 2021, and we also include the scores
of our methods on last year’s iteration of the competition for completeness. The relevance scores
from last year consist of − 2 (non-argument) or a range from 1 (low relevance, weak argument)
to 5 (high relevance, strong argument. This year’s relevance scores use the same range, however,
they consist of two separate dimensions: argument relevance and argument quality. There are
50 distinct topics each consisting of a short “title" field and a longer “description" and “narrative"
ifelds. For our queries, we only use the “title" field. Some examples of “title" fields include “Do
we need sex education in schools?" and “Should stem cell research be expanded?".
      </p>
      <sec id="sec-3-1">
        <title>3.1. Methods</title>
        <p>3.1.1. BM25
For our baseline approach, we use BM25. BM25 is a bag-of-words ranking formula that relies
on keyword matching between a query and a collection of arguments, along with various
weighting heuristics. To process, index, and search arguments, we use Pyserini, which is a
Python-based information retrieval toolkit built over Anserini and Lucene [18]. All argument
premises are processed and indexed using the default Pyserini settings. This includes stopword
removal and stemming. All queries are also processed similarly. We use Pyserini’s provided
BM25 implementation to search the corpus, only adjusting the 1 and  parameters. We tune
the parameters on last year’s topics and relevance scores.</p>
        <sec id="sec-3-1-1">
          <title>3.1.2. Semantic Search</title>
          <p>Given that BM25 only matches exact terms, we explore the efectiveness of encoder-based 
nearest neighbor search to help bridge potential vocabulary gaps. To do this, we first split the
premises of each argument by sentence into smaller passages of approximately 200 words each.
Then, we encode each passage using the msmarco-distilbert-base-v3 encoder model provided by
Sentence Transformers [19]. At a high level, msmarco-distilbert-base-v3 is a BERT-based [20]
Siamese sentence encoder fine-tuned for question-answering on the MS MARCO data set [ 21].
The passage embeddings are stored and indexed using the hnwslib Python library [22], which
provides an approximate nearest-neighbor lookup index using hierarchical navigable small
world graphs. Each topic title is also encoded using msmarco-distilbert-base-v3, and given
the encoded topic, we search for the approximate top  nearest neighbor passages. The top
arguments are ordered based on the maximum cosine similarity between the topic and any of
its passages. All parameters are again tuned using the previous iteration of the task.</p>
          <p>We also investigate combining the scores returned via semantic search with those returned
using BM25. To calculate this, we use the following formula:</p>
          <p>25 +  ×</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.3. Manifold Approximation</title>
          <p>
            Our third argument retrieval approach attempts to leverage the techniques utilized in UMAP
(Uniform Manifold Approximation and Projection) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. UMAP is a dimensionality reduction
technique that first approximates a uniform manifold for each data point and patches together
their local fuzzy simplicial set representations, where a simplicial set is a higher-dimensional
generalization of a directed graph. Then, this topoligical representation is used to assess and
optimize lower-dimensional representations. A full theoretical description of UMAP is beyond
the scope of this paper, so we focus solely on the computational aspects of UMAP’s manifold
approximation which are relevant to our retrieval approach.
          </p>
          <p>To approximate a uniform manifold for each data point , UMAP first finds the  nearest
neighbors to . Then, it defines   and  , where
(1)
(2)
(3)
  = {(,  )|1 ≤  ≤ , (,  ) &gt; 0},
=1</p>
          <p>∑︁ exp( − (0, (,  ) −  ) ) = log2(),</p>
          <p>and (,  ) is the distance between  and  . Intuitively,   is the distance to ’s closest
neighbor (in our case, the most similar passage) and   smooths and normalizes the distances
to the nearest neighbors. Next, UMAP calculates the following weights between data points:
((,  )) = exp(
− (0, (,  ) −  ) ).</p>
          <p />
          <p>Calculating this for every data point  results in a -granularity weighted adjacency matrix
between all points in the data. The authors of UMAP note that ((,  )), or entry ,  of the
weighted adjacency matrix, can be interpreted as the probability that a directed edge from  to
 exists.</p>
          <p>For the purposes of argument retrieval, our hypothesis is that strong, complete, and relevant
arguments will have many other arguments “pointing" to it. That is, these arguments should
have many high-probability incoming directed edges. Thus, for a given topic title, we first search
using the aforementioned interpolated BM25 and semantic retrieval methods. We encode all of
the passages for the top  arguments. Next, for each encoded passage, we find the  nearest
neighbors and calculate (1), (2), and (3) as described above. Finally, we score each argument by
the sum of all directed edges pointing to the argument.</p>
          <p>Note that the sum of these calculated passage weights possess diferent properties than just
the sum of the passage similarities. Most notably, Equation 2 constrains the scaled sum of
distances to log2(), where k is the number of nearest neighbors. Our understanding is that this
calculation gives importance to points that have fewer highly-similar (closer) neighbors. For
example, if we have two points (x) and (y), and the (point,distance) pairs of their three nearest
neighbors are
() : [(, 0.1), (, 0.2), (, 0.9)]
() : [(, 0.1), (, 0.2), (, 0.3)]
bm25
semantic
bm25-0.7semantic</p>
          <p>manifold
manifold-c10
then the weight between (x) and (b) will be higher than (y) and (e), even though they have
the same relative distances. Here are the resulting weights from the manifold calculation
(  = 0.179741,   = 0.113319):
() : [(, 1), (, 0.5733), (, 0.0117)]
() : [(, 1), (, 0.4138), (, 0.1712)]
Intuitively, this may help reduce importance of passages that are similar to many other passages,
as that passage will contribute lower weights to other passages.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results</title>
        <p>We submitted five runs to Touché 2021, and the performance measures for these five runs
are listed in Table 1. The 2021 runs are judged in two dimensions: argument relevance and
argument quality, which correspond to the second and third columns of the table, respectively.
We also include the performance of our retrieval models on the topics and relevance scores from
Touché 2020 as a reference, see column three. All measures are calculated using normalized
discounted cumulative gain at five (nDCG@5).</p>
        <p>The first run, “bm25", corresponds to the approach outlined in Section 3.1.1. We tune the
parameters using grid search and arrived at 1 = 3.2 and  = 0.2 using the 2020 topics and
relevance scores. The next row, “semantic", corresponds to Section 3.1.2. We set the number of
nearest neighbors  = 1000 for each topic. Next, “bm25-0.7semantic" denotes the interpolation
of the two aforementioned approaches, with an  value of 0.7. The final two rows correspond
to the approach described in Section 3.1.3. For “manifold", we assume the top 3 arguments from
“bm25-0.7semantic" are relevant and search for  = 50 nearest neighbors for each argument
passage. The retrieved passages are completely reranked by aggregating the weights over each
argument. For “manifold-c10", we perform the exact same search, but only rerank the top 10
arguments of the “bm25-0.7semantic" run.</p>
        <p>For this year’s evaluations, our best-performing run with respect to relevance is
“bm250.7semantic". However, all of our other runs which utilize BM25 (i.e., excluding “semantic")
perform similarly. With respect to quality, our best-performing run is “manifold". Here, it
is promising that “manifold" outperformed “manifold-c10", as this implies that the manifold
technique is able to increase argument quality by retrieving arguments outside of the top 10
initially-ranked arguments.</p>
        <p>It is unclear whether or not our initial hypothesis is supported by the scores listed in Table 1.
The evaluation metrics from this year seem to support our hypothesis in the context of our
“manifold" run, but last year’s results show a decrease in performance. This may be because last
year’s relevance scores combine many diferent measures into a single dimension. Furthermore,
it is dificult to separate out the efects of BM25 on our manifold-based approaches, since it
appears that these approaches perform similarly. This, along with the high scores of our “bm25"
run, stresses the importance of well-tuned robust models. Overall, these results are a step in the
right direction for our hypothesis, but more analysis is needed to draw firm conclusions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Visualization</title>
      <p>While a ranked list of document snippets is often suficient for ordinary web search, such a
list is not necessarily optimal for showing results of argument retrieval to the users because it
is common to discuss many topics during a debate and the user may want to see the topical
structure. These topics may be discussed at length, briefly mentioned, or revisited as the debate
unfolds. Traditional search engines, which require explicit user querying, often display relevant
documents and arguments in a ranked list, which makes it dificult to efectively capture and
visualize these topic changes. For example, it may be too time consuming for a participant in a
debate to constantly search for and read all of the relevant documents. Or, someone may want
a high-level summary of the debate at various points. Thus, we explore various visualization
techniques to help mitigate these concerns. This is accomplished by minimizing the necessity
of constant user input as well as visualizing these structural topic changes. Visualization of
search results has been studied before [17, 23, 24, 25]; however, existing visualization methods
will not work well for our use case, so we explore new approaches.</p>
      <p>For our visualization exploration, we utilize the args.me corpus to help summarize and
augment debates in real time. We demonstrate our visualization methods on the
publiclyavailable debate between Bill Nye and Ken Ham on Evolution vs. Creationism.2 We chose this
debate primarily because YouTube provides an accurate transcript of the debate with timestamps,
and because of the debate’s diverse topic coverage.</p>
      <p>The YouTube transcript timestamps occur approximately every 3 seconds and contain
approximately 1 − 8 words per timestamp. We maintain these groupings for our analysis. The
text for the transcript referenced in the analysis is in Table 4. The full text of each referenced
argument ID is available on GitHub.3</p>
      <sec id="sec-4-1">
        <title>4.1. Visualization Approach with BM25</title>
        <p>For any given timestamp , we define a look-back window of size  and collect all the terms
that occurred between −  and . Then, we search the args.me corpus using our BM25 retrieval
approach outlined in Section 3.1.1, with the query being the collected transcript terms. We
record the ranks of the top  arguments returned. We choose BM25 because it is well-known to
be robust and eficient. Repeating this over a given interval of timestamps results in a smoothed
argument-level summary for the interval.</p>
        <p>2https://www.youtube.com/watch?v=z6kgvhG3AkI
3https://github.com/kevinros/toucheRetrievalVisualization/tree/main/arguments
description of a bicycle incident
understanding scriptures, the gospel, God</p>
        <p>creator of universe, infinite power, God
having unlimited power, omnipotence of God</p>
        <p>the justness of God</p>
        <p>As an example, consider the debate time interval 110:53-114:04. Each timestamp and
corresponding text is listed in Table 4. We define a look-back window of size  = 5 and retrieve the
top  = 20 arguments for each timestamp. Then, we collect the number of times an argument
is ranked in the top 20 arguments across all timestamps, and consider only the five most
frequent arguments. Figure 1 displays the ranks of these five argument at each timestamp, and
Table 2 lists a high-level description of each argument. The parameters are manually tuned to
demonstrate the benefits and drawbacks of this visualization approach.</p>
        <p>Of the five arguments returned, S4fde9bb-Aef3913d8 seems to be topically irrelevant to the
transcript text. Interestingly, this argument appears to also be a transcript, and thus it contains
many filler words (such as “uh") also present in the debate transcript. It appears to be playing
the role of a background language model. The other four arguments seem to be relevant as
they discuss topics and themes present in the transcript at diferent timestamps. From 111:29 to
111:50, argument S9a8b0a09-A22358c86 is one of the highest-ranked, and it discusses “God",
“His kingdom", “scripture", and “His actions". From 112:22 to 112:47, we find that arguments
showing the validity of theistic evolution
biblical creationism, unfalsifiable</p>
        <p>heaven, hell, stars, God
physics, star formation, modern science
astronomy in the context of the Quran
S690aacea-A986a10d7 and S23dda237-A69f9884f are ranked the highest. Both arguments discuss
the powers of the creator of the universe. From 112:53 to 113:10, we observe that argument
S5059e885-Abe1aa26d is the highest-ranked, which argues in favor of the justness of God.</p>
        <p>One use case for this visualization technique is to help participants of the debate better
analyze and justify their stance. For example, the participants can draw on the additional
knowledge provided by the retrieved arguments to strengthen their own arguments in real-time.
On the other hand, it is also possible that rebuttals to participants’ arguments will be retrieved,
which could help increase the overall robustness of the debate by exposing counterpoints.</p>
        <p>In order to reduce noise and irrelevant arguments, we also explore the possibility of allowing
users to specify the search terms or arguments. More specifically, using pre-defined sets of
terms, we search the args.me corpus with BM25 to find the most relevant arguments to the
provided terms. Then, we display the frequencies of the returned arguments using the methods
outlined above, except we consider ranks through 100 rather than 20.</p>
        <p>Consider the same debate time interval and the keyword groups “bible god creationism"
and “heavens astronomy stars". Figure 2 displays the frequencies of the five most relevant
arguments to each keyword group. The first five argument IDs in the legend correspond to
the first keyword group, and the second five argument IDs in the legend correspond to the
second keyword group. Additionally, high-level descriptions of the arguments that appear in
Figure 2 are listed in Table 3. The first two arguments are from “bible god creationism" and
the last three arguments correspond to “heavens astronomy stars". From Figure 2, we see that
arguments relevant to both keyword groups are highly ranked between 112:10 and 112:36,
indicating that the keywords in the retrieved arguments strongly match the keywords from the
debate transcript in the time interval.</p>
        <p>An important benefit of this visualization technique is that it allows the user to specify
specific topics before, during, or after a debate in order to easily track various topic occurrences
for further analysis. For example, a user looking to get a high-level summary of a debate can
examine the ranking frequencies of known arguments in order to pinpoint the most relevant
points in the debate.</p>
        <p>As this visualization approach provides a high-level overview of a debate by referencing
relevant arguments using keywords, it abstracts away from the actual content of the debate and
relevant sentences within arguments. To help address this issue, we explore a more fine-grained
visualization approach in the following subsection.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Visualization Approach with UMAP</title>
        <p>
          The advent of new Transformer-based language models such as BERT [20] have lead to
impressive improvement on a variety of NLP tasks. We seek to use BERT’s semantic representation
space to better visualize the dynamics of arguments. To do so, we take advantage of UMAP [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
The goal of UMAP is to visualize high-dimensional embeddings in a low-dimensional space
while preserving topological and structural properties. Using the same BERT-based encoder
discussed in Section 3.1.2, we combine the encodings of the sentences of relevant arguments
and the “caterpillar embeddings" of our debate transcript to visualize how the debate evolves
over time. This approach allows us to analyze fine-grained topic changes as they unfold in the
debate, as well as their relevance to a reference corpus.
        </p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Caterpillar Embeddings</title>
          <p>Caterpillar embeddings are used to track the course of the debate over time. They consist of
a sequence of encoder representations taken from across the debate. A naïve approach is to
slide a window of size  over the sequence of words  in the transcript with stride . However,
this has the downside of both adding and removing information (words) at each step. Instead,
we split each step into two: a growth step and a contraction step. Given a window from word
 to + of the transcript for some , the next window will grow to be from  to ++.
The subsequent window will be a contraction: it will range from + to ++ Hence, this
“caterpillar embedding" technique moves along the transcript of the debate like a caterpillar
inching along. At step , the start and end of the window,  and  respectively, are calculated</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Argument Retrieval-Based Semantic Visualizations</title>
          <p>In order to better define the topology of the semantic space, we extract the top  most frequent
arguments over the transcript interval 110:53 to 127:01 as described in Section 4.1 from the
args.me corpus, split them into sentences, and encode the sentences using the
previouslymentioned BERT-based sentence encoder. We combine these argument embeddings with the
caterpillar embeddings of the debate transcript and project them into two dimensions using
UMAP. This creates a path of the debate as it visits diferent arguments in the semantic space.
We can then use the nearby neighbors of the caterpillar embeddings as relevant arguments to
show the user at a given timestamp. The full animation can be found on GitHub.4</p>
          <p>Regardless of which  value we use, we find that this UMAP projection does not preserve
the original space well regarding nearest neighbors. We believe this is because of the large
diferences between the semantic structures of the conversational YouTube debate and the
written structures of the corpus debates. To mitigate this, we use a nearest neighbor search in
the original space, and we plot the debate embedding using its  nearest neighbors. Through
empirical exploration, we find that  = 100 and  = 100 yields the clearest results. Additionally,
we consider the same window as explored in Section 4.1, namely 110:53 to 114:04. Note that the
transcript of the debate in this window is available in Table 4. The resulting path at various
timestamps is shown in Figure 3.</p>
          <p>The argument quickly moves to the lower left quadrant, which we find to signify the creation
of the universe and heavens, particularly in relation to God. The path briefly moves to the right,
when the debate focuses more on the omnipotence and omniscience of God. Finally, the debate
moves upward, when the discussion changes to physics, life science, and astronomy. The full
video can also be found on GitHub.5</p>
          <p>In Figure 3, we clearly see groupings of arguments’ topics and how they change over time.
Interestingly, we can also examine the topic path through the corpus that the YouTube debate
took. This could be used to track debate topic progression in a visual manner, and augment live
debates with both relevant information at the current point as well as relevant information for
future, forecasted points. More work is needed, however, to investigate the efects of parameter
selection and the efectiveness in various domains.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In this work, we apply several techniques to the Touché Argument Retrieval task, such as BM25,
semantic search, and manifold-based reranking. Among them, we find that the manifold-based
reranking was sometimes more efective in returning high-quality arguments when compared
to BM25. In the future, we hope to compute the manifold weights for every argument in the
data set as a preprocessing step, and investigate eficient ways to combine these weights with
4https://github.com/kevinros/toucheRetrievalVisualization/blob/main/animations/full_anim.mp4
5https://github.com/kevinros/toucheRetrievalVisualization/blob/main/animations/100top_mean_anim.mp4
(a) Initial frame. Time: 111:10
(c) Time: 113:27
(d) Final frame. Time: 114:04
retrieval methods that perform well along the relevance dimension, in order to return the
strongest and the most relevant arguments.</p>
      <p>To better display search results to users in argument retrieval, we also introduce various
visualization techniques based on BM25 keyword matching and UMAP dimensionality reduction,
which shows promise in the direction of debate augmentation. Although the benefits of this
augmentation are dificult to quantify, we believe it will help improve debate understanding and
retention, as well as open up avenues for future work. We also hope to improve the visualization
by further testing diferent parameters, retrieval techniques, and background corpora.
conversations using dual-purpose speech, in: Proceedings of the 17th annual ACM
symposium on User Interface Software and Technology, 2004, pp. 237–246.
[14] L. E. Boyd, A. Rangel, H. Tomimbang, A. Conejo-Toledo, K. Patel, M. Tentori, G. R. Hayes,
Saywat: Augmenting face-to-face conversations for adults with autism, in: Proceedings of
the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 4872–4883.
[15] A. Popescu-Belis, M. Yazdani, A. Nanchen, P. N. Garner, A speech-based just-in-time
retrieval system using semantic search, Technical Report, Idiap, 2011.
[16] P. N. Garner, J. Dines, T. Hain, A. El Hannani, M. Karafiat, D. Korchagin, M. Lincoln, V. Wan,</p>
      <p>L. Zhang, Real-time ASR from meetings, Technical Report, Idiap, 2009.
[17] M. Hearst, Search user interfaces, Cambridge university press, 2009.
[18] J. Lin, X. Ma, S.-C. Lin, J.-H. Yang, R. Pradeep, R. Nogueira, Pyserini: An easy-to-use
python toolkit to support replicable ir research with sparse and dense representations,
arXiv preprint arXiv:2102.10073 (2021).
[19] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
arXiv preprint arXiv:1908.10084 (2019).
[20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[21] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, Ms marco: A
human generated machine reading comprehension dataset, in: CoCo@ NIPS, 2016.
[22] Y. A. Malkov, D. A. Yashunin, Eficient and robust approximate nearest neighbor search
using hierarchical navigable small world graphs, IEEE transactions on pattern analysis
and machine intelligence 42 (2018) 824–836.
[23] S. Liu, X. Wang, C. Collins, W. Dou, F. Ouyang, M. El-Assady, L. Jiang, D. A. Keim, Bridging
text visualization and mining: A task-driven survey, IEEE transactions on visualization
and computer graphics 25 (2018) 2482–2504.
[24] A. M. MacEachren, A. Jaiswal, A. C. Robinson, S. Pezanowski, A. Savelyev, P. Mitra,
X. Zhang, J. Blanford, Senseplace2: Geotwitter analytics support for situational awareness,
in: 2011 IEEE conference on visual analytics science and technology (VAST), IEEE, 2011,
pp. 181–190.
[25] J. Peltonen, K. Belorustceva, T. Ruotsalo, Topic-relevance map: Visualization for improving
search result comprehension, in: Proceedings of the 22nd international conference on
intelligent user interfaces, 2017, pp. 611–622.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perrin</surname>
          </string-name>
          , Social media usage,
          <source>Pew research center</source>
          <volume>125</volume>
          (
          <year>2015</year>
          )
          <fpage>52</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Del Vicario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Vivaldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bessi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scala</surname>
          </string-name>
          , G. Caldarelli, W. Quattrociocchi,
          <article-title>Echo chambers: Emotional contagion</article-title>
          and group polarization on facebook,
          <source>Scientific reports 6</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gienapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beloucif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Overview of Touché 2021:
          <article-title>Argument Retrieval</article-title>
          , in: D.
          <string-name>
            <surname>Hiemstra</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-F. Moens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Perego</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
          </string-name>
          (Eds.),
          <source>Advances in Information Retrieval. 43rd European Conference on IR Research (ECIR</source>
          <year>2021</year>
          ), volume
          <volume>12036</volume>
          of Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2021</year>
          , pp.
          <fpage>574</fpage>
          -
          <lpage>582</lpage>
          . URL: https://urldefense. com/v3/__https://link.springer.com/chapter/10.1007/978-3-
          <fpage>030</fpage>
          -72240-1_
          <fpage>67</fpage>
          __;!!
          <article-title>DZ3fjg! qiIStvQ7N0tMq0XWzNrBDwdUszdG_1Cm5f0npcVKkP9lL7BwqrITiN5eveoZNiWt_Q$</article-title>
          .
          <source>doi:10</source>
          .1007/978-3-
          <fpage>030</fpage>
          -72240-1\_
          <fpage>67</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gienapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beloucif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Overview of Touché 2021:
          <article-title>Argument Retrieval</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          (Eds.),
          <source>Working Notes Papers of the CLEF 2021 Evaluation Labs, CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beloucif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gienapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Overview of Touché 2020:
          <article-title>Argument Retrieval</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Névéol (Eds.),
          <source>Working Notes Papers of the CLEF 2020 Evaluation Labs</source>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          . URL: https://urldefense.com/v3/__http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /__;!!DZ3fjg! qiIStvQ7N0tMq0XWzNrBDwdUszdG_
          <fpage>1Cm5f0npcVKkP9lL7BwqrITiN5ever8RPesww</fpage>
          $.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>McInnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Healy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Melville</surname>
          </string-name>
          , Umap:
          <article-title>Uniform manifold approximation and projection for dimension reduction</article-title>
          , arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>03426</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Data Acquisition for Argument Search: The args</article-title>
          .me corpus, in: C. Benzmüller, H. Stuckenschmidt (Eds.),
          <source>42nd German Conference on Artificial Intelligence (KI</source>
          <year>2019</year>
          ), Springer, Berlin Heidelberg New York,
          <year>2019</year>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>59</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30179-8\_4.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Passage-level evidence in document retrieval</article-title>
          ,
          <source>in: SIGIR'94</source>
          , Springer,
          <year>1994</year>
          , pp.
          <fpage>302</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Sentence representation with manifold learning for biomedical texts</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>218</volume>
          (
          <year>2021</year>
          )
          <fpage>106869</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T. B. Hashimoto</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Alvarez-Melis</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          <string-name>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <article-title>Word embeddings as metric recovery in semantic spaces, Transactions of the Association for Computational Linguistics 4 (</article-title>
          <year>2016</year>
          )
          <fpage>273</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hasan</surname>
          </string-name>
          , E. Curry,
          <article-title>Word re-embedding via manifold dimensionality retention, Association for Computational Linguistics (ACL</article-title>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Latent topic text representation learning on statistical manifolds</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems 29</source>
          (
          <year>2018</year>
          )
          <fpage>5643</fpage>
          -
          <lpage>5654</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lyons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Skeels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Starner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Snoeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ashbrook</surname>
          </string-name>
          , Augmenting
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>