<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Open-Domain Web Search Engine for Answering Comparative Questions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tinsaye Abye</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tilmann Sager</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Juliane Triebel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leipzig University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present an open-domain web search engine that can help answer comparative questions like "Is X better than Y for Z?" by providing argumentative documents. Building such a system requires multiple steps that each includes non-trivial challenges. State-of-the-art search engines do not perform very well on these tasks, and approaches to solve it are part of current research. We present a system to process the following tasks: Detection of comparative relations in a comparative question, nding claims and arguments relevant to answering comparative questions and scoring the relevance, support and credibility of a website. We follow a rule-based syntactic NLP approach for the comparative relation extraction. To measure the relevance of a document, we combine results from the existing models BERT and CAM. Those results are reused to determine the support through an evidence-based approach, while the credibility consists of a multitude of scores. With this approach, we achieved the best NDCG@5 of all systems participating in task 2 of the Touche Lab on Argument Retrieval at CLEF 2020.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        When searching the web for the answer to a comparative question, popular search
engines like Google or DuckDuckGo provide results by referring to
question-andanswer1 or debate2 websites, where mostly subjective opinions are displayed [26].
Domain speci c comparison systems rely on structured data which makes them
inappropriate for answering open domain comparative questions since the data is
not structured. Although modern search engines are advanced, answering
comparative questions is still challenging [26] and therefore subject to current
research in the eld of information retrieval. We participate in CLEF 2020 for
Task 2 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which sets the challenge to retrieve and re-rank documents of the
ClueWeb123 data set aiming to answer comparative questions that are not
categorized in a speci c domain with argumentative results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The results will be
assessed on the dimensions relevance, support and credibility. Our prototype is
tested on TIRA [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>To create an open-domain web search engine for comparative question answering,
we build upon results from numerous elds with a connection to information
retrieval, like comparison mining, argument mining, comparative opinion mining
and evidence mining.</p>
      <p>
        Comparative Relation Extraction We receive user input as a natural
language comparative question. Therefore the problem of detecting the entities and
features, i.e. the comparative relation (CR) arises just like in comparative
opinion mining. [28] Comparative relation extraction is used successfully by Xu et
al. [30] by using a dependency graph to detect the CR. Comparative opinion
mining from online reviews uses Part-Of-Speech (POS) tags as well as domain
speci c aspects [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Two techniques based on syntactic analysis were compared
by Jindal et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The use of label sequential rules that uses POS tags
outperforms class sequential rules using keywords. The use of syntactic dependency
trees was proven helpful by Gao et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Xu et al. [30]. We conduct a
syntactic analysis of the queries using POS tags and dependency trees. We follow
a syntactic natural language processing (NLP) approach to provide a
domainagnostic, rule-based model for extracting the CR from the comparative user
query.
      </p>
      <p>
        Comparison and Argument Mining The CR then serve as input to
comparison and argument mining models that rely on structured data. Hence we
close the gap between user queries and structured input needed by
comparison and argument mining models with comparative relation extraction. The
Comparative Argumentative Machine (CAM) by Schildwachter et al. [26] is an
open-domain information retrieval system capable of retrieving comparative
argumentative sentences for two given entities and several features. Argument
mining systems detect argumentative sentences including premises, claims or
evidence sentences [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Fromm et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] demonstrate that taking the context
of an argument into account signi cantly boosts the performance of an argument
detecting system, whereas most of traditional argumentative unit detecting
systems [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ] are topic agnostic. The Bidirectional Encoder Representations from
Transformers (BERT) model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed by Reimers et al. [24] nds arguments,
and is also able to detect, if they support a certain topic. We make use of a
combination of these models to nd argumentative documents that are relevant to
the user query and therefore help answer comparative questions.
      </p>
      <sec id="sec-2-1">
        <title>3 http://lemurproject.org/clueweb12</title>
        <p>
          Support and Evidence Mining We further increase the quality of candidates
presented by CAM and BERT with Support and Evidence Mining. As we aim
to nd documents that provide arguments for decision-making, the mining of
context-dependent, evidence-based arguments is an important task. Braunstain
et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] rank support sentences in community-based question answering forums
about movies. Evidence Mining provides many publications of di erent
subtasks like extracting evidence sentences from documents [25],detecting claims
and retrieving evidence [
          <xref ref-type="bibr" rid="ref1 ref13">1, 13</xref>
          ]. Since we are interested in a document's support
for a query, we extract evidence sentences and analyze their relatedness to claims
using methods presented by Rinott et al. [25]. A higher ranking of documents
with a good support and evidence for the claims made, should further increase
the usefulness of the search results in order to answer the comparative question
asked by the user.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Comparison Retrieval Model</title>
      <p>
        In this section, the comparison retrieval model we designed to build an
opendomain web search engine for answering comparative questions, as sketched in
Figure 1, is described in detail. The retrieval model consists of four phases to
retrieve and rank web search results to answer comparative questions. In phase
one (blue), the question is analyzed for its comparative relation and expanded
queries are sent to the ChatNoir [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] search engine. The retrieved documents
then go through NLP processing. During the second phase (red) comparison
and argument mining are conducted on the pre-processed documents. Through
evidence mining, link analysis and diverse other sources, scores that quantify
the quality of the documents are collected. In the third phase (yellow), the
collected scores are summed up to build the meta-scores of relevance, support
and credibility. The nal phase four (green) delivers weighted scores and
reranked documents.
      </p>
      <p>This section is structured accordingly to the phases depicted in Figure 1:
subsection 3.1 describes the pre-processing, subsection 3.2 the analysis of the
documents and reranking is covered in subsection 3.3.
3.1</p>
      <sec id="sec-3-1">
        <title>Pre-processing</title>
        <p>
          In the pre-processing phase, the user query in analyzed, expanded and several
queries are sent to the ChatNoir [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] search engine. A linguistic analysis is
performed on the content of the websites returned by ChatNoir.
        </p>
        <p>
          Comparative Relation Extraction A comparative relation consists of
entities to compare and the features, the entities are compared by. Albeit a CR is
easy to detect for a human, it is not trivial to extract it computationally [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Due
to the given task, we know that the user query is a comparative question. But
we must detect the comparative relation within that question. Using a syntactic
NLP approach, we use spacy (model en core web sm, trained on the OntoNotes
Web Corpus [29]) to extract the CR from the user query. It provides us with
tokenization, chunking, POS tagging and dependency parsing. Two main types
of comparative relations occur in the user query: comparative and superlative
questions. As the syntactic structure of a comparison in a question does not
di er from the CR in a statement, we use the term "superlative" accordingly to
Jindal et al. The term "comparative" matches their "non-equal-grabbable" [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
As the direction of the CR and the distinction between "feature" and "relation
word" is not relevant in this case, our term "feature" covers them both.
        </p>
        <p>The CR of superlative questions are detected by the following characteristics
that result in high accuracy for the given topics: Superlative questions contain
no grammatical or-conjunction but a superlative adjective ("highest") which is
the feature. The child of superlative is the entity ("mountain") and the child of a
prepositional modi er is another feature ("earth"). For queries with a syntactic
pattern like: "What is the highest mountain on earth?", the presented method
works perfectly.</p>
        <p>To determine the entities in a comparative question we source the syntactic
information from the question's dependency graph. This strategy allows for more
than two entities to be detected in one sentence. One entity is the parent of a
conjunction and the other entities. First we look for this pattern in chunks, i.e.
nominal phrases, of the question. Finding a chunk provides the advantage that
it contains descriptive adjectives or compounds. If no chunks could be found,
the same rule is applied to the tokens of the question. For queries without a
conjunction there is no simple rule to detect them. A feasible strategy was to
assume that if there are up to two nouns in the query, that are no attributes,
they are the entities to compare. Entities can also be verbs if there are no
nonstop-word nouns in the question.</p>
        <p>Features turned out to be more diverse than entities, but most of them are
comparative adjectives, superlative adjectives, verbs in base form or children of
adjectival complements. If there are direct objects or nominal subjects in the
question that were not detected as entities, they are assumed to be features.
Finally adjectives, compounds and numeric modi ers are added to all entity and
feature tokens, e.g. to be able to compare "green tea" to "black tea".</p>
        <p>
          The two main reasons for failing the CR-detection are errors of POS
tagger [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and features detected as entities and vice versa. Since the system is
customized for the topics of the task, it will not scale for comparative questions
with di erent syntactic structure, especially more complex ones. In general the
achieved results for detecting comparative relations from the user queries can be
seen as satisfying.
        </p>
        <p>
          Query Expansion To expand the queries that will be send to ChatNoir,
[
          <xref ref-type="bibr" rid="ref21 ref3">3, 21</xref>
          ] we collect synonyms and antonyms of the comparative relation's features.
Antonyms are fetched from the WordNet lexical database 4. Synonyms are
retrieved through a Gensim continuous skip-gram model [23] that was trained on a
dump of the English Wikipedia from 2017 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In some cases the Gensim model
also returned antonyms, but as we do not care for the direction of the CR, that
is not a problem. From the comparative relation and the expanded features we
send four queries to ChatNoir using the index generated from the ClueWeb12
data set. First the original comparative question raised by the user, second the
entities combined with 'AND', third the entities and the features and fourth the
entities, features and their synonyms and antonyms combined with 'OR'.
Multiple expanded queries increase the number of results and therefore the recall
reachable through further re-ranking.
        </p>
        <p>
          From ChatNoir we receive a list of results consisting of snippets, page titles,
URIs, page ranks and spam ranks. We fetch the API again to get the full
HTMLdocument for every result. From the HTML-documents the external links and
the body content are extracted. We remove any CSS and JavaScript code from
the body, as well as header-, footer- and nav-tags by using the Python package
BeautifulSoup5. The body is then segmented into sentences. Then tokenization,
POS tagging, dependency parsing and named entity recognition is performed on
the sentences [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Sentences with a minimum length of four tokens are selected
for further analysis and reranking of the results.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Analysis</title>
        <p>Since our interest is nding documents that help answering comparative
questions, we aim to detect comparative, argumentative and supportive sentences
in the retrieved documents. We analyze them for sentences which compare the
entities extracted from the user query, for sentences which contain arguments
regarding the decision to be made and for sentences which support such claims.
The documents should discuss both entities, may be favoring one of them and
ideally justify the decision.</p>
        <sec id="sec-3-2-1">
          <title>4 https://wordnet.princeton.edu/</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>5 https://www.crummy.com/software/BeautifulSoup/</title>
          <p>
            Comparative Sentences In order to nd sentences that compare the entities
from the user query, we choose two of the best performing classi cation models
according to Panchenko et al. [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]: BOW and InferSent [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. Both models are
based on the gradient boosting library XGBoost [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. BOW uses bag-of-words
word embeddings and InferSent uses the sentence embeddings method for
feature representation [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. To assess the models, we crafted a small evaluation data
set with 100 sentences, 60 of them being comparative taken from the ClueWeb12
corpus covering 11 di erent topics. Both classi ers are able to distinguish
between three cases: the sentence is comparative in favor of the rst entity, in
favor of the second entity or contains no comparison. We collect all sentences
detected as comparison and discard the non-comparative ones. Both BOW and
InferSent have a high precision, while BOW performed slightly better. Although
both models reach the same recall at .48, we observed that the true positives
they return are partially distinct. The strategy of running both models in
combination leads to a signi cantly higher recall of .66. To achieve that improvement,
we rst run BOW. On the sentences that were not recognized as comparative in
the rst step, we run the detection with InferSent.
          </p>
          <p>
            Argumentative Sentences We exploit the importance of topic awareness for
detecting argumentative sentences by using the ne-tuned BERT [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] model
proposed by Reimers et al. [24]. For a sentence and a topic, which in our case is
one of the entities, the BERT classi er can detect if the sentence is an
argument for, argument against or no argument regarding the topic. This enables
us to collect arguments that aid the decision-making, because the arguments
detected are relevant to the question to be answered. Despite the good
performance compared to other models, with BERT we detected systematic errors as
well. Comparative sentences were not classi ed properly. These are according
to BERT for or against both entities at the same time, leading us to exclude
comparative sentences.
          </p>
          <p>
            Support Sentences Next to the number of arguments, a well-balanced
argumentation structure is also crucial for satisfying the user's information need.
Neither a document with a high number of claims, that are not supported by
any argument, nor a document with a high number of arguments, that are not
connected to any relevant claims, helps to nd well-founded statements.
Therefore we want to extract the arguments included in the document that directly
support one or several claims. De ning support sentences turned out to be
challenging, see section 2. Therefore we used the de nition of an Context-Dependent
Evidence (CDE) by Rinott et al. [25]. Their de nition of a CDE sentence is
very similar to the de nition of a support sentence of Braunstain et al. [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]:"[A
Context Dependent Evidence is] a text segment that directly supports a claim in
the context of the topic." Nevertheless, we continue using the term support
sentence. Rinott et al. also provide important characteristics of a support sentence:
semantic relatedness, relative location between claim and support sentence and
sentiment-agreement between them. Following the steps of Rinott et al. as a
guideline, we implemented a support sentence classi er. Since support sentences
are arguments as well, we take the BERT result (see section 3.2) as input for the
candidate extraction. Therefore we rank the BERT-classi ed arguments by their
context independent features, e.g. named entity labels like PER or ORG, certain
terms like nevertheless, therefore or but, and lter the rst 70%, except there
might be less than 10 sentences after thresholding. We used a lower threshold
because BERT often returns only a few sentences. Taking the CAM-classi ed
sentences as claims regarding to our task to provide arguments for comparisons,
we determine semantic and sentiment relatedness between every claim and every
candidate in the context-dependent stage. The semantic relatedness is measured
by BERT, the sentiment similarity by TextBlob.6
3.3
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Reranking</title>
        <p>In order to compare and nally re-rank the retrieved documents, we de ne several
measures for each document, that are assigned to the scores relevance, support
and credibility.</p>
        <p>
          Relevance As de ned by Manning et al. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]: "A document is relevant if it is
one that the user perceives as containing information of value with respect to
their personal information need." Therefore our measures for the relevance are
mainly comprised from the comparative sentences and argumentative sentences
described in subsection 3.2. We establish a ComparisonCount that results from
counting the comparative sentences detected according to section 3.2. CAM is
able to classify which entity is described as better or worse than its competitor
in the sentence. First we considered introducing a score that provides a
wellbalanced measure between the two compared entities. But there is not always an
equal amount of arguments for both entities. If one of the entities is not as good
as the other, one can not assume to nd many arguments for it. Comparative
sentences in general take both entities into account giving su cient re ection on
both of them. But only counting the argumentative sentences returned by BERT
(ArgumentCount) could favor documents only dealing with one of the entities.
To prevent such unbalanced results, we put together a formula that considers
both the amount of arguments but also the distribution between the entities:
ArgumentRatio = total arguments
jarguments 1 arguments 2j
total arguments + 1
(1)
In Equation 1 arguments n means arguments related to entity n, while total
arguments represents the total amount of arguments in the document. The
fraction takes their distribution over all entities into account. Further, we divided
the term with a threshold, took the hyperbolic tangent to atten the function
and generalizing it for more than two entities. But the repetition of the same,
        </p>
        <sec id="sec-3-3-1">
          <title>6 https://textblob.readthedocs.io/en/dev</title>
          <p>
            possibly rephrased, argument can spoil the measure. To overcome this issue we
measured the similarity between the sentences using BERT for detecting
argument similarity. This method was also presented by Reimers et al. [24].
Support A support sentence is de ned as "a text segment that directly
supports a claim in the context of the topic" [
            <xref ref-type="bibr" rid="ref1">25, 1</xref>
            ]. To convert the output of the
support analysis into a measure, we de ned a good document with respect to the
given task: a good document has a high number of support sentences, that
directly support claims included in the document. The connection between claim
and support sentence is described by their semantic and sentiment similarity.
To score the argumentation structure of a document, two measures were
dened: SemanticRatio and SentimentRatio. SemanticRatio describes the number
of support sentences per claim that are semantically similar. Since Braunstain
et al. [
            <xref ref-type="bibr" rid="ref5">5, 138</xref>
            ] and Rinott et al. [
            <xref ref-type="bibr" rid="ref6">25, 6</xref>
            ] point out that especially the sentiment
similarity between a claim and a support sentence is an indicator for a
coherence, SentimentRatio was added as well. To counterbalance SemanticRatio and
SentimentRatio, SupportCount as the number of distinct support sentences was
added.
          </p>
          <p>
            Credibility Jensen et al. de ne credibility as the "believability of a source
due to message recipients' perceptions of the source's trustworthiness and
expertise" [
            <xref ref-type="bibr" rid="ref1 ref14">14, 1</xref>
            ]. Since Rafalak et al. [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] claim credibility as very subjective, we
added multiple di erent measures to balance the score. Web Of Trust (WOT)7
provides user ratings for websites. This measure describes the Bayesian averaged
opinion of at least 10 users for a website's host. Additionally, the SpamRank, the
likelihood of spam, was added, which is delivered by ChatNoir. We assume that
the richer the language used by the author of the document, the more credible is
the information. With other words, the more complex a text is written, the more
e ort was put into writing this text by the author. Therefore we calculate three
independent readability scores: Automated Readability Index (ARI), Dale-Chall
Readability (DaleChall) and Flesch Reading Ease (Flesch). ARI [27] describes
the understandability of a text. Since ARI, DaleChall and Flesch inspect
different aspects of a document, e.g. the usage of di erent words or the number
of syllables per word, all the measures were included to cover a wide range of
the understandability and readability of a document. However, the actual scores
calculated for the received documents were out of the ranges proposed by the
respective authors. This is partly due to the di culty of extracting clean texts
out of HTML documents. To prevent the top results from containing a lot of
advertisements or links that lead to block-listed hosts, the external links of a
document are checked against a list of block-listed domains.8 The number of
"bad" links is added as the negative measure BlocklistedCount.
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>7 https://www.mywot.com</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>8 https://github.com/hemiipatu/Blocklists.git</title>
          <p>Reranking The measures (as shown in Table 1) are weighted, normalized
between 0 and 100, and then combined into the scores relevance, support and
credibility.
Finally, the three resulting scores are also weighted and then combined into the
nal score by which the documents are re-ranked as the nal result of our search
engine.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>We now present the results of the evaluation conducted by the CLEF
committee to rate all participating systems. The ranked list of retrieved documents
was judged by human assessors on the three dimensions document relevance,
argumentative support, and trustworthiness and credibility of the web
documents and arguments. With the introduced search engine, we reached the best
submitted run according to NDCG@5 with a score of 0.580. The combination of
di erent techniques and approaches has proven promising. As they have di erent
strengths and weaknesses, there is a potential to balance each other out.
Nevertheless, a processing pipeline consisting of so many steps suggests a detailed
evaluation and examination of the propagation of errors through the phases of
the model.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>We participated in the Touche Lab on Argument Retrieval at CLEF 2020 with
a web-scale search engine capable of answering comparative questions resulting
in the best submitted run according to NDCG@5. However, each step in the
comparison retrieval model could be explored further, re ned or be tackled with
other methods. Whereas the task at hand requires to build a complete search
engine, the extensive study of each part could have been subject to a research
project alone. Future work can tie in at various points. From comparative relation
extraction, over identifying comparative, argumentative and support sentences,
to a learning-to-rank algorithm, the question how a machine learning approach
could perform almost imposes itself upon the research community. Widening
the capabilities of the system to cope not only with the given set of user queries
but with any comparative question in natural language can be seen as a further
challenge.
23. Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large
Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP
Frameworks. pp. 45{50. ELRA, Valletta, Malta (May 2010), http://is.muni.cz/
publication/884893/en
24. Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.:
Classi cation and Clustering of Arguments with Contextualized Word Embeddings.
In: Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). pp. 567{578. Florence, Italy (07 2019),
https://arxiv.org/abs/1906.09821
25. Rinott, R., Dankin, L., Perez, C.A., Khapra, M.M., Aharoni, E., Slonim, N.: Show
me your evidence - an automatic method for context dependent evidence detection.</p>
      <p>In: EMNLP (2015)
26. Schildwachter, M., Bondarenko, A., Zenker, J., Hagen, M., Biemann, C.,
Panchenko, A.: Answering Comparative Questions: Better than Ten-Blue-Links?
In: Halvey, M., Ruthven, I., Azzopardi, L., Murdock, V., Qvarfordt, P., Joho, H.
(eds.) 2019 Conference on Human Information Interaction and Retrieval (CHIIR
2019). ACM (Mar 2019). https://doi.org/10.1145/3295750.3298916
27. Smith, E., Senter, R.: Automated readability index. AMRL-TR. Aerospace Medical</p>
      <p>Research Laboratories (6570th) iii, 1{14 (06 1967)
28. Varathan, K.D., Giachanou, A., Crestani, F.: Comparative opinion mining: a
review. Journal of the Association for Information Science and Technology 68(4),
811{829 (2017)
29. Weischedel, R.: OntoNotes Release 5.0 LDC2013T19. Web Download. Linguistic</p>
      <p>Data Consortium, Philadelphia (2013)
30. Xu, K., Liao, S.S., Li, J., Song, Y.: Mining comparative opinions from customer
reviews for competitive intelligence. Decision support systems 50(4), 743{754 (2011)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adler</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosciani-Gilroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Real-time claim detection from news articles and retrieval of semantically-similar factchecks</article-title>
          .
          <source>In: Proceedings of the NewsIR'19 Workshop</source>
          at SIGIR (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sliwa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borad</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ziyaei</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghobadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>What works and what does not: Classi er and feature analysis for argument mining</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on Argument Mining</source>
          . pp.
          <volume>91</volume>
          {
          <issue>96</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bevendor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl</article-title>
          . In: Azzopardi,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Pasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          . (eds.)
          <source>Advances in Information Retrieval. 40th European Conference on IR Research (ECIR 2018). Lecture Notes in Computer Science</source>
          , Springer, Berlin Heidelberg New York (
          <year>Mar 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Frobe, M.,
          <string-name>
            <surname>Beloucif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gienapp</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Overview of Touche 2020:
          <article-title>Argument Retrieval</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2020 Evaluation Labs (Sep</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Braunstain</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurland</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carmel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szpektor</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shtok</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supporting human answers for advice-seeking questions in cqa sites</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Moens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.F.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Silvestri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.M.</given-names>
            ,
            <surname>Hau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Silvello</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds.) Advances in Information Retrieval. pp.
          <volume>129</volume>
          {
          <fpage>141</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In: Conference: the 22nd ACM SIGKDD International Conference</source>
          . pp.
          <volume>785</volume>
          {
          <issue>794</issue>
          (08
          <year>2016</year>
          ). https://doi.org/10.1145/2939672.2939785
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chernodub</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliynyk</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heidenreich</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>TARGER: Neural argument mining at your ngertips</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          . pp.
          <volume>195</volume>
          {
          <fpage>200</fpage>
          . Association for Computational Linguistics, Florence,
          <source>Italy (Jul</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P19</fpage>
          - 3031, https://www.aclweb.org/anthology/P19-3031
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>670</volume>
          {
          <fpage>680</fpage>
          . Association for Computational Linguistics, Copenhagen, Denmark (
          <year>September 2017</year>
          ), https://www.aclweb.org/anthology/D17-1070
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . In: Burstein,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Doran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Solorio</surname>
          </string-name>
          , T. (eds.)
          <article-title>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/n19-1423, https://doi.org/10.18653/v1/n19-
          <fpage>1423</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fromm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faerman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seidl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Tacam: Topic and context aware argument mining</article-title>
          .
          <source>IEEE/WIC/ACM International Conference on Web Intelligence on - WI '19</source>
          (
          <year>2019</year>
          ). https://doi.org/10.1145/3350546.3352506, http://dx.doi.org/10.1145/ 3350546.3352506
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Identifying competitors through comparative relation mining of online reviews in the restaurant industry</article-title>
          .
          <source>International Journal of Hospitality Management</source>
          <volume>71</volume>
          ,
          <issue>19</issue>
          {
          <fpage>32</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>: spacy 2: Natural language understanding with bloom embeddings. convolutional neural networks and incremental parsing 7(1) (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Janardhan</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Esteves</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          : Defactonlp:
          <article-title>Fact veri cation using entity recognition, t df vector comparison and decomposable attention</article-title>
          . arXiv pp.
          <source>arXiv{1809</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowry</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E ects of automated and participative decision support in computer-aided credibility assessment</article-title>
          .
          <source>Journal of Management Information Systems</source>
          <volume>28</volume>
          ,
          <fpage>201</fpage>
          {
          <volume>233</volume>
          (07
          <year>2011</year>
          ). https://doi.org/10.2307/41304610
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jindal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Mining comparative sentences and relations</article-title>
          .
          <source>In: Aaai</source>
          . vol.
          <volume>22</volume>
          , p.
          <volume>9</volume>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kutuzov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fares</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oepen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velldal</surname>
          </string-name>
          , E.:
          <article-title>Word vectors, reuse, and replicability: Towards a community repository of large-text resources</article-title>
          .
          <source>In: Proceedings of the 58th Conference on Simulation and Modelling</source>
          . pp.
          <volume>271</volume>
          {
          <fpage>276</fpage>
          . Linkoping University Electronic Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lippi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torroni</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Argumentation mining</article-title>
          .
          <source>ACM Transactions on Internet Technology 16, 1{25 (03</source>
          <year>2016</year>
          ). https://doi.org/10.1145/2850417
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (
          <year>2008</year>
          ), http://nlp.stanford.edu/IR-book/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Categorizing Comparative Sentences</article-title>
          . In: Stein,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Wachsmuth</surname>
          </string-name>
          , H. (eds.) 6th Workshop on Argument Mining (ArgMining
          <year>2019</year>
          )
          <article-title>at ACL</article-title>
          .
          <article-title>Association for Computational Linguistics</article-title>
          (
          <year>Aug 2019</year>
          ), https://www.aclweb.org/anthology/W19-4516
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World</article-title>
          .
          <source>The Information Retrieval Series</source>
          , Springer (Sep
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -22948-1 5
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gra</surname>
            <given-names>egger</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welsch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>ChatNoir: A Search Engine for the ClueWeb09 Corpus</article-title>
          . In: Hersh,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          , M. (eds.) 35th
          <source>International ACM Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2012</year>
          ). p.
          <fpage>1004</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (Aug
          <year>2012</year>
          ). https://doi.org/10.1145/2348283.2348429
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rafalak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abramczuk</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierzbicki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Incredible: is (almost) all web content trustworthy? analysis of psychological factors related to website credibility evaluation</article-title>
          .
          <source>Proceedings of the companion publication of the 23rd international conference on World wide web companion</source>
          pp.
          <volume>1117</volume>
          {
          <issue>1122</issue>
          (04
          <year>2014</year>
          ). https://doi.org/10.1145/2567948.2578997
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>