<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University of Amsterdam at CLEF 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mahsa S. Shahshahani</string-name>
          <email>m.shahshahani@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <email>kamps@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper documents the University of Amsterdam's participation in CLEF 2020 Touché Track. This is the first year this track has been introduced at CLEF, and we were attracted to participate in it due to its potentialities for Parliamentary debates we are currently working on. This track consists of two tasks: Conversational Argument Retrieval and Comparative Argument Retrieval. We submitted a run to both tasks. For the first task, we used a combination of the traditional BM25 model and learning to rank models. BM25 model helps to retrieve relevant arguments, and learning to rank model helps to re-rank the list and put stronger arguments on top of the list. For the second task, Comparative Argument Retrieval, we proposed a pipeline to re-rank documents retrieved from Clueweb using three features: PageRank scores, web domains, and argumentativeness. Preliminary results on 5 queries have shown that this heuristic pipeline may help to achieve a balance among three important dimensions: relevance, trustworthiness, and argumentativeness.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We believe that we passed the era in which search engines were supposed to only give
us a ranked list of documents or answers and they have more potentialities to help us
in decision making process. Argument retrieval task has been defined to formulate this
problem. Touché track at CLEF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] offers an opportunity to work on this interesting
problem having access to a debate corpus from two different points of view: Looking
for different views about a problem in debates between opponents and supporters of
a controversial issue, and looking for comparative opinions about different alternative.
This track consists of a different task (two tasks in total) for each point of view and we
submitted a run to both tasks.
      </p>
      <p>In this paper, we cover both tasks; first, we give a high-level summary of the first
task, followed by our detailed approach. Then, we cover the same for the second task.
Finally, we will conclude the paper by mentioning our main contributions and findings.</p>
      <p>We will add results section whenever the results would be out.</p>
    </sec>
    <sec id="sec-2">
      <title>Conversational Argument Retrieval</title>
      <p>
        For detailed information about CLEF track’s experimental setup, we refer to the overview
paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and to the track homepage.1 However, we provide a high-level summary to
make this paper self-contained.
2.1
The goal of this task is to retrieve relevant arguments from online debate portals, given
a query on a controversial topic.
      </p>
      <p>
        Corpus Args.me [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has been created by crawling arguments from 7 debate websites.
It includes 387,606 arguments taken from 59,637 debates. A search engine based on
Elasticsearch has been set to make it easier to work with this corpus [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This search
engine ranks arguments using BM25 ranking algorithm. Different approaches can be
used later to re-rank these retrieved arguments.
      </p>
      <p>Queries 50 controversial topics have been picked for this task. Each topic has both pro
and con arguments in the corpus.</p>
      <p>
        Quality Assessment Proposed approaches are supposed to retrieve “strong” arguments.
An argument is consiagstuhl ered strong if it is topically relevant, logically cogent,
rhetorically well-written, and useful to help in stance-building process. Here, we define
these assessment dimensions taken from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the annotation guidelines for
Dagstuhl15512 ArgQuality Corpus.2
Topical relevance: As every other ranking task, retrieved arguments should provide the
user with relevant information about the query.
      </p>
      <p>Besides relevance, in general, there are three main dimensions for assessing the
quality of arguments: logic, rhetoric, and dialectic.</p>
      <p>
        Logical cogency: An argument with acceptable premises that are relevant and sufficient
to the argument’s conclusion is considered “cogent” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Rhetorical well-writtenness: An argument is called “rhetorically well-written” if it is
effective and successful in persuading a target audience of a conclusion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Dialectic: An argument is considered reasonable if it contributes in the users’
stancebuilding process regards a given issue in a way that is acceptable to everyone.
2.2
      </p>
      <sec id="sec-2-1">
        <title>Our Approach</title>
        <p>We treated this task as a re-ranking problem. In the absence of training data, we have to
use unsupervised approaches. However, we used an existing debate dataset created for
studying argument quality assessment to train a classifier. First, we describe this corpus.
Later, we explain our approach in three consecutive steps.</p>
        <sec id="sec-2-1-1">
          <title>1 https://events.webis.de/touche-20/shared-task-1.html 2 http://www.arguana.com</title>
          <p>
            Corpus Dagstuhl-15512 ArgQuality [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] includes 20 arguments for each of 16 queries.
Three annotators have annotated these 320 arguments on 15 dimensions with three
labels. However, we only use four dimensions: cogency, effectiveness, reasonableness,
and overall quality. The set of labels includes ordinal scores from 1 (low) to 3 (high) for
all dimensions. We used majority voting technique to get the label for each dimension,
and substituted the ones labeled as 3 with 2 as there are a very few samples with label
“3” in the corpus .
          </p>
          <p>Approach We created the final ranked list in three steps. Before explaining each step,
we explain the way we represented arguments.</p>
          <p>
            Argument Representation: We used pre-trained BERT-base [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] model from
HuggingFace Transformers framework in python to represent arguments. As BERT model
imposes a limit on the length of documents after tokenization, we used the first 512 tokens
in an argument if its length exceeds this limit.
          </p>
          <p>First step- Ranking: BM25 is the traditional unsupervised ranking model which scores
the relevancy of documents (here arguments) in regards to queries based on the
frequency of common terms between the query and argument. We ranked arguments for
each topic based on BM25 using args.me search engine.</p>
          <p>Second step- Classification: We trained a classifier on Dagstuhl-15512 ArgQuality
corpus to recognize and label cogency, well-writtenness, reasonableness and overall
quality of each argument. Later, we applied this classifier to all retrieved arguments in
the ranked list from the fist step.</p>
          <p>
            We got the majority voting for each dimension; in all of the arguments the majority
for all four dimensions are the same. It is aligned with the conclusion in the main paper
[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] which indicates that cogency, effectiveness, and reasonableness correlate strongly
with overall quality, and also much with each other.
          </p>
          <p>We trained two classifiers on 90% of data: A decision tree classifier, and an SVM
classifier. The accuracy on the remaining 10% of data has been shown in Table 1. We
used scikit-learn framework for python 3 to train both classifiers. For decision tree, we
used “gini” criterion, and set minimum required samples to split to 2. For SVM, we
used “rbf” kernel, and set regularization parameter to 1. We selected SVM classifier for
the further step.</p>
          <p>Third step-Re-ranking: In the third step, we re-ranked retrieved arguments from the
first step using learning-to-rank models. We use the output of step 2 as a feature in
step 3 to re-rank the arguments. We trained three different learning to rank models:
Ranknet, RandomForests, and LambdaRank. We used RankLib3 library with default
sets of parameters to apply these models. In order to train learning to rank models, we
used argument representations based on BERT model, the output of the second step,
and two additional features based on named entities. We defined two binary features
indicating the presence of numerical named entities (percent, quantity, money) and other
3 https://github.com/codelibs/ranklib
entities (person, location, organization) in the argument. We showed the number of
arguments with and without these entities in Figure 1 and Figure 2. These figures show
the difference in the distribution of each of these features in strong (label=2) and weak
(label=1) arguments. Arguments using these kinds of entities are more likely providing
users with persuasive and effective information to make their stance, and this can lead
to a more probability to be labeled as “strong”.</p>
          <p>We trained learning to rank models on Dagstuhl dataset, and applied the best model
(Ranknet) to re-rank retrieved arguments from args.me dataset for all 50 topics in the
shared task. We trained models on 90% of data and reported accuracy on the remaining
10% of data in Table 2.
In the final relevance judgments, 30 documents for each topic have been annotated using
6 labels: -2,1,2,3,4 and 5.</p>
          <p>Classifier We ran our classifier that was trained on Dagstuhl-15512 ArgQuality corpus
on arguments from judgments to see if the results are correlated with the final judgment.
Unfortunately, we observed that the classifier does not work well and return label ’1’
for more than 90% of the judged arguments. This suggests that the classifier does not
play a role in the final results of our learning to rank model. This is not surprising as
the dataset we trained our SVM classifier on is very different from the test set. The
arguments in the training set are very short and consist of only one to three sentences,
while we are labeling a set of complete documents in the test set. However, as the
BERT representation has a limit on the size of the input text, for the longer texts we
only used the first 512 tokens of each argument. Using more advanced approaches such
as averaging over sliding windows of 512 tokens might make the classifier useful.
Entities Similar to Figure 1 and Figure 2, we looked into the distribution of
numerical and other types of entities in relevant and non-relevant documents. We considered
documents with label ’-2’ as non-relevant, and other documents in the judgment file as
relevant. Results have been shown in Figure 3 and 4. As it is obvious from the figures,
the distribution of entities is the opposite of what we observed in the dataset we used at
the time of developing our model. However, it still can be used as a feature as it shows
a little difference between relevant and non-relevant documents.</p>
          <p>Query Length We looked into per query results to get an insight into the way our
model works. We used NDCG@5 metric as it has been the main metric for the shared
task.
Model</p>
          <p>NDCG@1 NDCG@5 NDCG@10 MAP
UvATask1LTR 0.5214
0.5548
0.3709
0.1129</p>
          <p>
            As we used the whole topic (without removing stop words), the model works better
for shorter queries (Figure 5).
Similar to the previous section, for detailed information about this task’s experimental
setup, we refer to the overview paper [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] and to the track homepage.4 However, we
provide a high-level summary.
The goal of this task is to retrieve and rank documents from web that help to answer a
comparative question from “everyday life”.
          </p>
          <p>
            Corpus Clueweb12 is a dataset created by crawling 733,019,372 web documents seeded
with 2,820,500 urls from Clueweb09 [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. We used a publicly available search engine [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]
based on Elasticsearch to retrieve documents from Clueweb12 based on BM25 ranking
model.
          </p>
          <p>Queries 50 comparative topics from everyday life have been picked for this task.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Our approach</title>
        <p>We treated this task as a re-ranking problem. In the absence of training data, we have to
use unsupervised approaches. We labeled retrieved documents for 5 topics to have an
insight of how our heuristic approach works.</p>
        <p>In the first step, we used ChatNoir search engine to rank documents retrieved for
each topic. In the second step, we used three different features to re-rank them. Here,
we introduce the features we used, followed by our heuristic approach to combine them
and create the final ranked list.</p>
        <sec id="sec-2-2-1">
          <title>4 https://events.webis.de/touche-20/shared-task-2.html</title>
          <p>Argumentativeness We trained a simple SVM classifier based on data from args.me
corpus and Clueweb to distinguish between argumentative and non-argumentative
documents. Similar to the first task, we used BERT-based model from HuggingFace
Transformers library for Python to represent documents, and we used the first 512 tokens in
a document if its length exceeds the limit of BERT model.</p>
          <p>To train the classifier, we used a small sample from each corpus. These samples
are created by submitting all 50 controversial queries in the first task to both corpora
and got up to 100 documents for each query. Then, we manually removed
argumentative documents from the sample taken from Clueweb and considered the remaining
documents as negative examples. All retrieved documents from args.me corpus have
been considered as positive examples. The final training set consists of 3000 positive
and 3000 negative examples. Then, We trained a simple SVM classifier on 80% of the
data, and evaluated it on the remaining 20% of documents. It achieved 87% in terms
of accuracy. All parameters for the argumentativeness classifier have been set to their
default values in Scikit-learn5 library for Python.</p>
          <p>Web domains Clueweb has been formed by crawling web documents with some
postfilters. But, the goal of this task is to retrieve documents including personal opinions or
suggestions. Thus, documents from particular domains like Wikipedia are not desirable.
On the contrary, documents from discussion forums, debate websites, and blogs can be
very helpful. Having this intuition in mind, we defined a binary feature that indicates
if the source URL for a discussion contains ’forum’ or ’blog’ terms to give a bonus to
web pages from discussion forums or blogs.</p>
          <p>PageRank Although desired documents are those from discussion forums and personal
blogs, they should also be trustworthy. To take trustworthiness into account, we used
page rank scores to prioritize documents taken from more reliable sources.</p>
          <p>In ChatNoir search engine, every returned document has been associated with a
PageRank score. We directly used these returned scores.</p>
          <p>Re-ranking We introduced three features, and our goal is to re-rank documents based
on a combination of these features (argumentativeness, domain addresses , and
PageRank scores).</p>
          <p>To generate the final ranked list, we make a heuristic ranking pipeline in four steps:
– The 1st step: The initial ranked list is taken from ChatNoir search engine. ChatNoir
retrieves documents from Clueweb and ranks them using traditional BM25 ranking
model. We use the whole topic title as the query. We also examined submitting the
query after removing stop words or just using the entities in the topic title. But,
using the whole topic title worked better.
– The 2nd step: Page-rank scores are used to re-rank the list from the first step in
descending order. This may result in putting a document, initially ranked very low,
on top of the list. In order to avoid this, moving documents in the ranked list is
limited to a maximum of 10 positions.
5 https://scikit-learn.org/stable/supervised_learning.html#supervised-learning
Model</p>
          <p>Relevance Argumentativeness Trustworthiness
initial 0.87
pagerank 0.89
domain 0.84
argumentative classifier 0.80
mixed
– The 3rd step: Web domain are used to re-rank ranked list from the second step. To
do this, the documents with positive domain feature (which means the document
is taken from a blog or discussion forum) are put on top of the list. This has been
performed within every 10 documents in the list. We split the list into chunks of 10
documents and we keep their relative positions.
– The 4th step:All retrieved documents in the ranked list from the third step are
classified using the argumentativeness classifier we have trained. Documents classified
as positive are put on top of the list. We keep their relative positions. This has been
performed within every chunk of 10 documents in the list.</p>
          <p>We put this limits on moving documents in the list with this intuition in mind that
relevance should be prioritized in comparison with trustworthiness and
argumentativeness.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Results</title>
        <p>Since the final judgment file does not consist of separate judgment lists for three
different dimensions (relevance, trustworthiness, and argumentativeness), we include our
preliminary results too.</p>
        <p>Preliminary Results: To gain an insight into the effectiveness of our heuristic model,
we manually labeled 10 retrieved documents for 5 queries. We labeled documents using
three labels: 0 for non-relevant, non-argumentative, or untrustworthy; 1 for relevant,
argumentative, or trustworthy, and 2 for highly-relevant, highly-argumentative, or highly
trustworthy.</p>
        <p>We evaluated the top 10 documents for each ranked list: BM25, re-ranked by
PageRank scores, re-ranked by web domains, re-ranked by argumentativeness, and mixed
model.</p>
        <p>Evaluation results have been reported in Table 3. The heuristic mixed model does
not achieve the same performance as BM25 in terms of relevance, the same performance
as the argumentative classifier model in terms of argumentativeness, and the same
performance as the PageRank model in terms of trustworthiness. But, it seems that it struck
a balance among all three dimensions.
initial 0.5068
pagerank 0.4723
domain 0.4100
argumentative classifier 0.4281
UvATask2SVM
Final Results: The final results for every step of our model have been reported in
Figure 5. Since three different aspects had been defined for evaluation of this task, We
expected to receive three judgment sets. However, the judgment file issued only the
overall relevance, which we sacrificed for the other two dimensions.</p>
        <p>Initially, we processed the data and implemented our models based on the three
different aspects which had been defined previously. Those include relevance,
trustworthiness, and argumentativeness. Given the fact that we received the overall relevance
file only, we extracted our results purely based on the relevance factor for the
evaluation of this task. This categorically implies that the obtained results could have been
more optimized if we had performed our preliminary assessment based on one aspect
only.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This paper documents our first participation in the Touché 2020 Track. We explained
our approaches for both tasks in the track: Conversational Argument Retrieval, and
Comparative Argument Retrieval.</p>
      <p>For the Conversational Argument Retrieval task, we used an existing argument
quality assessment dataset to train a classifier and re-rank arguments based on the output of
this classifier. We showed that named entities are important features to distinguish
between strong and weak arguments on the preliminary data. But, the final results show
that neither classifier nor entities do not help. However, it is worth mentioning that the
classifier had been trained on a completely different set of arguments. So, if we train it
on the same data from args.me, it might be useful for the final results. This needs further
investigations to be proved.</p>
      <p>For the Comparative Argument Retrieval task, we introduced three features to
rerank arguments taken from Clueweb. We proposed a pipeline to combine different
aspects (relevance, trustworthiness, and argumentativeness) to create the final ranked list.
Preliminary results have shown that this heuristic pipeline may successfully strike a
balance between all three dimensions. However, the final evaluation has been done only
on relevance. This caused our method to be sub-optimal.</p>
      <p>We hope and expect that the valuable bench-marking data created at Touché track
will be of great value to motivate, and greatly facilitate, further research into argument
retrieval.
This research was supported in part by the Netherlands Organization for Scientific
Research (NWO, grant # CISC.CC.016, ACCESS project). Views expressed in this
paper are not necessarily shared or endorsed by those funding the research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Data acquisition for argument search: The args.me corpus</article-title>
          . In: Benzmüller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H. (eds.)
          <source>KI 2019: Advances in Artificial Intelligence - 42nd German Conference on AI</source>
          , Kassel, Germany,
          <source>September 23-26</source>
          ,
          <year>2019</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11793</volume>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>59</lpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -30179-
          <issue>8</issue>
          _4, https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -30179-
          <issue>8</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blair</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Groundwork in the theory of argumentation: Selected papers of J</article-title>
          .
          <source>Anthony Blair</source>
          , vol.
          <volume>21</volume>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fröbe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloucif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gienapp</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Overview of Touché 2020:
          <article-title>Argument Retrieval</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2020 Evaluation Labs (Sep</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloucif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Touché: First shared task on argument retrieval</article-title>
          . In: Jose,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Magalhães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            ,
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.)
          <source>Advances in Information Retrieval - 42nd European Conference on IR Research</source>
          , ECIR
          <year>2020</year>
          , Lisbon, Portugal,
          <source>April 14-17</source>
          ,
          <year>2020</year>
          , Proceedings,
          <source>Part II. Lecture Notes in Computer Science</source>
          , vol.
          <volume>12036</volume>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>523</lpage>
          . Springer (
          <year>2020</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -45442-5_
          <fpage>67</fpage>
          , https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -45442-5_
          <fpage>67</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soboroff</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Overview of the TREC 2009 web track</article-title>
          . In: Voorhees,
          <string-name>
            <given-names>E.M.</given-names>
            ,
            <surname>Buckland</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.P</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of The Eighteenth Text REtrieval Conference</source>
          , TREC 2009, Gaithersburg, Maryland, USA, November
          <volume>17</volume>
          -
          <issue>20</issue>
          ,
          <year>2009</year>
          . NIST Special Publication, vol.
          <volume>500</volume>
          -
          <fpage>278</fpage>
          . National Institute of Standards and
          <source>Technology (NIST)</source>
          (
          <year>2009</year>
          ), http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . In: Burstein,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Doran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Solorio</surname>
          </string-name>
          , T. (eds.)
          <article-title>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics (
          <year>2019</year>
          ). https://doi.org/10.18653/v1/n19-1423, https://doi.org/10.18653/v1/n19-
          <fpage>1423</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , R.H.,
          <string-name>
            <surname>Blair</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Logical</surname>
          </string-name>
          self-defense.
          <source>International Debate Education Association</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graßegger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welsch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Chatnoir: a search engine for the clueweb09 corpus</article-title>
          . In: Hersh,
          <string-name>
            <given-names>W.R.</given-names>
            ,
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          , M. (eds.)
          <source>The 35th International ACM SIGIR conference on research and development in Information Retrieval</source>
          , SIGIR '12,
          <string-name>
            <surname>Portland</surname>
            ,
            <given-names>OR</given-names>
          </string-name>
          , USA,
          <year>August</year>
          12-
          <issue>16</issue>
          ,
          <year>2012</year>
          . p.
          <fpage>1004</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          ). https://doi.org/10.1145/2348283.2348429, https://doi.org/10.1145/2348283.2348429
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naderi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prabhakaran</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thijm</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirst</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Computational argumentation quality assessment in natural language</article-title>
          .
          <source>In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>EACL</surname>
          </string-name>
          <year>2017</year>
          , Valencia, Spain, April 3-
          <issue>7</issue>
          ,
          <year>2017</year>
          , Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <fpage>176</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.18653/v1/e17-1017, https://doi.org/10.18653/v1/e17-
          <fpage>1017</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khatib</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puschmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorsch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morari</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Building an argument search engine for the web</article-title>
          . In: Habernal,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Ashley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.D.</given-names>
            ,
            <surname>Cardie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Litman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.J.</given-names>
            ,
            <surname>Petasis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Slonim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.R</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 4th Workshop on Argument Mining</source>
          ,
          <source>ArgMining@EMNLP</source>
          <year>2017</year>
          , Copenhagen, Denmark, September 8,
          <year>2017</year>
          . pp.
          <fpage>49</fpage>
          -
          <lpage>59</lpage>
          . Association for Computational Linguistics (
          <year>2017</year>
          ). https://doi.org/10.18653/v1/w17-5106, https://doi.org/10.18653/v1/w17-
          <fpage>5106</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>