<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Powering COVID-19 community Q&amp;A with Curated Side Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manisha Verma</string-name>
          <email>R@3</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kapil Thadani</string-name>
          <email>P@1</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shaunak Mishra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yahoo! Research NYC</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yahoo! Research NYC</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yahoo! Research NYC</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Table 7 Variation in</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Community question answering and discussion platforms such as Reddit, Yahoo! answers or Quora provide users the flexibility of asking open ended questions to a large audience, and replies to such questions maybe useful both to the user and the community on certain topics such as health, sports or finance. Given the recent events around COVID-19, some of these platforms have attracted 2000+ questions from users about several aspects associated with the disease. Given the impact of this disease on general public, in this work we investigate ways to improve the ranking of user generated answers on COVID-19. We specifically explore the utility of external technical sources of side information (such as CDC guidelines or WHO FAQs) in improving answer ranking on such platforms. We found that ranking user answers based on question-answer similarity is not suficient, and existing models cannot efectively exploit external (side) information. In this work, we demonstrate the efectiveness of diferent attention based neural models that can directly exploit side information available in technical documents or verified forums (e.g., research publications on COVID-19 or WHO website). Augmented with a temperature mechanism, the attention based neural models can selectively determine the relevance of side information for a given user question, while ranking answers.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;question answering</kwd>
        <kwd>deep learning</kwd>
        <kwd>knowledge injection</kwd>
        <kwd>NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tween these entities to score answers. However, there are
some limitations of knowledge bases that would make
Question answering systems are key to finding relevant it dificult to use them for community Q&amp;A for rapidly
and timely information about several issues. Community evolving topics such as disease outbreaks (e.g. ebola,
question answering (cQ&amp;A) platforms such as Reddit, Ya- COVID-19), wild-fires or earthquakes. Firstly, knowledge
hoo! answers or Quora have been used to ask questions bases contain information about established entities, and
about wide ranging topics. Most of these platforms let do not rapidly evolve to incorporate new information
users ask, answer, vote or comment on questions present which makes them unreliable for novel disease outbreaks
on the platform. However, question answering platforms such as COVID-19 where information rapidly changes
are useful not only for getting public opinions or votes and its verification is time sensitive. Secondly, it may be
about areas such as entertainment or sports but can also hard to determine what even constitutes an entity as new
serve as information hot-spots for more sensitive topics information arrives about the topic. To overcome these
such as health, injuries or legal topics. Thus, it is imper- limitation in this work, we posit that external curated
ative that when the user visits sensitive topics content, free-text or semi-structured informational sources can
answer ranking also takes into account curated side infor- also be used efectively for cQ&amp;A tasks.
mation from reliable (external) sources. Most prior work In this work, we demonstrate that free text or semi
on cQ&amp;A has focused on incorporating question-answer structured external information sources such as CDC1,
similarity [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], user reputation [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ], integration of WHO2 or NHS3 can be very useful for ranking answers
multi-modal content [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], community interaction features on community Q&amp;A platforms since they contain
fre[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] associated with answers or just the question answer- quently updated information about several topics such as
ing network [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] on the platform. However, there is very ongoing disease outbreaks, vaccines or resources about
limited work on incorporating curated content from exter- other topics such as surgeries, birth control or historical
nal sources. Existing work only exploits knowledge bases numerical data about diseases across the world.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that consist of diferent entities and relationships be- We argue that for sensitive topics such as COVID-19,
it is useful to use publicly available vetted information for
improving our ranking systems. In this work, we explore
COVID-19 questions community Q&amp;A
(Yahoo! Answers, Quora, Reddit)
curated docs Community Question and Answering (cQ&amp;A) systems is
(research papers, a well researched sub-field both in information retrieval
sample question WHO, CDC, NHS) and NLP communities. Several systems have been
proposed to rank user submitted answers to questions on
community platforms such as Yahoo! answers, Reddit
and Quora.
      </p>
      <p>
        answer Ranking user submitted answers on community
answers from ranker question-answering platforms has been addressed with
members several approaches. Primary method is to determine the
relevance of the answer given an input question. Text
fever coughing and tiredness some serious ... based matching is one of the most common approaches
You can have all or some of the symptoms to rank answers. Researchers have used several
meththat can be caused by any other virus ... ods to compute similarity between a question and user
SWyHmOpto&amp;mCsDofCvcirluaismFeuvpertoor14chdilalys,sC..o.ugh ... most relevant fgeeanteurraetebdasaendswqueressttioonde-atenrsmwienre mrealtecvhainncge.isFours eindsitnan[c1e],
with 17 features extracted from unigrams, bigrams and
the symptoms vary as the base virus mutates web correlation features using unstructured user search
Figure 1: Illustrative example of COVID-19 community an- logs to rank answers. It is worth noting that user features
swer ranking powered by side information in the form of and community features when incorporated may still
research papers, and information from verified sources (such yield further improvements in the performance of these
as CDC, WHO, and NHS). models but this is not the focus of our work. The authors
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] used questions extracted from Yahoo! answers
for their experiments. Researchers have used diferent
tsahpneescwuitficeiarlisltlyyfoofrofcqpuuusebsoltinicolynrasanvakasisinloagcbialaentesinwdfoewrrsimt hfaotCrioOqnuVfeoIsDrti-ro1an9ns.kiWinnge iaabnnpeesp[w1nr1oeu,ars1cse2hrd]eesasinpusetu[hc3cto,hirv1s3erel]uypst.oerCeLrsoaSennTnvktMoaaltuintootsinowrenlepeararlser.nsnOeeitnntwhtgoe,qrrufkoaesrpshtpiinaorovsnteasacanahlncseedos,
Atwnoswpeurbsl4icalnydabv)arielacbelnetlpyrrimeleaaryseQda&amp;nAnodtaattaesdeQts&amp;:aA) dYaathaosoet!
isnugch[1a6s]d,aotct2evnetcio[n14[]1,7t,r6e,e1-k1e,r1n8e]lsor[1d5e]e,padbveeliresfarnieatlwleoarrkns[sa9on]usrwinceeprss:rea5s)oeTnnRcCEeCOo-fVCtIOwDVo-1IeD9x.t[We1r0ne]aealxnspdelmbo)rieW-sttHhrueOcutqtuuirleietsdyticoounfrsdaaetneepdd T[1h9e]rehahvaevebeaelsnoubseeedntostuscdoieres eqxupelsotrioinngacnodmamnsuwnietry,puasiresr.
ltaueetpartMoenrnnnaotlireni(oexsgniisdsmptmeei)onceidgicnficehfasloastllarnywmti,esiwa-mtotheifot-apontthr.troeeOapn-nauotkrirsoteeansxnaypisstnweetremteimhmrpisssee.irnwnattospurorrkenestre1oen0giKcmue+lpaoqrtfueoedevxse--
ifaaamnolnnwatrssemtwwarioayaeencstrrtsieiwbno.vantWhisodeeoehntdrnhiliqtmeetushiateesohtscwesethyiistoieoshnntneaegebrpmaapicsnpsar.eponfdWrarioecfneeeahca-ceeotthsexurexpparsteorloseaorrsar[ret3ebse,sleae4oemsv,xmea5itl-iene,nsr7ttqen,r]suuiattlecionistisnurttianfrohoneonirdkst-wttttceapoiheeavuruoirneneattrnnhsottlosiueammomvrnlafneeroactiasonreetdrpbaimdfgecoaeloillaswcfnrsbldetmeloiodecytxttavasohthnidtneegiatreersonhnoutiqqnmeruarouueralrapfeemeclirssroenematitoufniidrvuoooreceaenrnsthtsmerw-.ayiriaftsRcaaushneesttnalcisteicnksowtsfmhuohknioenrr.io.nrcnegWwrFogmaCiwmqsgenpOauuitkptstehaVrchiporelnhafIeinotDttig1cneeyrpi-nmudgmfic1rbteas9ebpaseymlcalnersyiisorhncasaeaneiuonlotst,iswsuhnsinwtegwrreetaenahashetentieteirafi--xnndosst-
wwisbpnnCnwooooOoosTorrtertrVasakhkretttnhsIxei[Daomect2tlnee-no0iao1nnd,,n9doqade2ss,efuu1howeilekt,noafhpse6ncswwtoer]loiius,rrloowyfsthsmnorhiolitkosreanoawmndtmyu[cegfaos2Qvoroeinf1eenrs&amp;c,ebqrtgre,x2auAca.at0saelpepoe]swrikpsstfdneonhroilaolnyceoyrlaurwceQrcersuevhel&amp;nrefooleeaadenAlsrttvigeedtfinitedshdnoidyfeegbiitrsanrnoaetntfegtsnoooyhemtprfu.efmatisracaTcq.scasstuhEktutissiseixsaous.s.iilncMtsndFaihtcoocoinonoanrae-srngssstrecall for correct answer retrieval improves by ∼ 17% and recent work is [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on incorporating medical KB for
rank∼ot9h%erfcoQr &amp;boAthmsooduerlcse. datasets respectively over several ing answers on medical Q&amp;A platforms. They propose
to learn path based representation of entities (from KB)
present in question and answers posted by users. This
approach relies on reliable detection of entities first, which
may be absent for emerging topics such as COVID-19
4https://answers.yahoo.com/
5https://www.who.int/emergencies/diseases/novelcoronavirus-2019/question-and-answers-hub
pandemic. Another limitation of this work is that
external knowledge may not always be present in a structured
format. For example, CDC guidelines are usually
simple question-answer pairs posted on the website. This
makes it dificult to apply their approach to our problem.
      </p>
      <p>The proposed approach in this work incorporates
semistructured information directly with help of temperature
regulated attention.</p>
      <p>
        Finally, with the rise of COVID-19, researchers across
disciplines are actively publishing information and
datasets to share understanding of the virus and its
impact on people. Researchers routinely organize dedicated
challenges such as SemEval [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] with tasks such as
ranking answers on QA forums. One such initiative is
TRECCOVID track [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which released queries, documents
and manual relevance judgements to power search for Figure 2: External source augmentation model
COVID related information. Authors in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] also released
COVID-19 related QA dataset with 100+ questions and
answers pairs extracted from TREC COVID 6 initiative.
      </p>
      <p>
        These questions/answer pairs are not user generated con- external source ⟨, ⟩=1.
tent, hence, do not reflect real user questions. We also
rely on recently released Q&amp;A dataset from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for our 3.2. Proposed Model
task. We also compile a dataset of 2000+ COVID-19 ques- In this work, we explore token-level matching
mechations with 10K+ answers all submitted by users on Yahoo! nism to determine the relevance of information in the
answers for this work. external source that may inform the label prediction task.
Our model ( -att) aims to match a given user question
3. Method with all the submitted answers in the presence of
external information about the same domain. First, the
3.1. Problem formulation question , an answer  and additional metadata can
be encoded into a -dimensional vector  using a text
In this work, we focus on ranking answers for  questions encoder input. We use LSTM based encoder for both
ques1, . . . ,  related to an emerging topics such as COVID. tion and answer in the primary source which can handle
Each  is associated with a set of two or more answers input sequences of variable length.
 = { :  ≥ 2} and corresponding labels  =
{ :  ≥ 2} representing answer relevance. We use Question Encoding: Each word  in a question is
binary indicator for relevance where relevant judgments represented as a  dimensional vector with pre-trained
(e.g., favorite, upvoted) are provided by the user, i.e.,  ∈ word embeddings. LSTM takes each token embedding
{0, 1} respectively. as input and updates hidden state ℎ based on previous
      </p>
      <p>
        We attempt to model the relevance of each answer  state ℎ− 1. Finally, the hidden state is input to a feed
forto its corresponding question using an external source ward layer with smaller dimension  &lt;  to compress
which may contain free text or semi-structured informa- question encoding as follows:
tion. For example, the external source could consist of
information-seeking queries or questions 1, . . . ,  ℎ =   (ℎ− 1, ),  =  (ℎ + )
related to a topic, with each  linked to a set of rel- (1)
evant scientific articles or answers , where each
answer/document 1, . . . ,  may be judged for rele- Answer Encoding: Each word  in the answer is also
vance by human judges [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or some experts. represented as a  dimensional vector with pre-trained
      </p>
      <p>We hypothesize that this semi-structured or free-text word embeddings. LSTM takes each token embedding as
information may be valuable in identifying user answer input and updates hidden state ℎ. We also reduce the
quality for certain kinds of questions, although not all. dimension of answer encoding with a feed forward layer
We investigate this with our model to recover the true with dimension  &lt;  as follows:
labels  for each user answer  ∈  given its
question , category information, and information from the
ℎ =   (ℎ− 1, ),  =  (ℎ + )
(2)</p>
      <sec id="sec-1-1">
        <title>6https://ir.nist.gov/covidSubmit/data.html</title>
        <p>We concatenate the question and answer representations with respect to the question encoding. Temperature ( )
for further processing. parameter helps us control the uniformity of attention
 = [, ] (3) wlayeiegrhptesrcep.trFoinnaolvleyr,
ltahbeeilnspaurtevpercetodrictedaunsdinthgealemarunletdiweighted average of side information ′. We use binary
External source encoding: External sources of in- cross entropy loss to train the proposed model.
formation can vary from task-to-task. We encode each
segment of data individually. For instance, if there are ˆ = output([ ; ′])
two segments in the source (e.g. question/answer or
query/document), our system encodes both segments in- where output uses sigmoid activation function. Since
dividually. We use the same encoding architecture used community questions may often be entirely unrelated to
for primary source question/answer encoding above. En- external sources, a key aspect of this approach is
detercoding example for two segment external source is given mining whether the external source is useful, not merely
below. attending to its entries that are most relevant.
Temperature based attention mechanism is useful in controlling
ℎ =   (ℎ− 1, ),  =  (ℎ + w)hich external source entries are useful for user
quesℎ =   (ℎ− 1, ),  =  (ℎ + t)ions. It is worth noting that one will have to experiment
(4) and tune the value of temperature  such that ranking</p>
        <p>performance improves.</p>
        <p>We incorporate external source encoding with a
temperature ( ) based variant of scaled dot-product
attention, which provides a straightforward conditioning
approach over a set of query-document pairs. Question</p>
        <p>. If
encoding vector  serves as a query over keys 
two segments are present in the external source such as
query/document, the model uses the attention weights
over first segment (e.g. query) to determine the
importance of the second segment (e.g. document) respectively. 4.1. Data
It is easy to extend this framework to external sources
with multiple segments. The two segment attention is
described below.</p>
        <p>⊤ 
 = √</p>
        <p>/
  = ∑︀ /
′ = ∑︁</p>
        <p />
      </sec>
      <sec id="sec-1-2">
        <title>To summarize, temperature ( ) based attention helps</title>
        <p>corresponding 
determine the relevance of each</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental Setup</title>
      <sec id="sec-2-1">
        <title>Given the model architecture, in this section, we provide a detailed overview of diferent datasets, metrics and baselines used in our experiments. (5)</title>
        <p>
          We compiled two question answering datasets. The first
was collected from Yahoo! answers and the second was
recently released in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] where both datasets have questions
raised by real users. In this work we focus specifically
on questions associated with COVID-19. Diferent
statistics about the train and test split of both q&amp;a datasets
are given in Table 2 respectively. A pair of relevant and
non-relevant answers for a question in both datasets is
also shown in Table 1 for reference. More details about
them is given below.
(a) Yahoo! ques length
(b) Yahoo! ans length
(c) Infobot ques length
(d) Infobot ans length
Yahoo! Dataset : We crawled COVID-19 related
questions from Yahoo! answers 7 using several keywords
such as ‘coronavirus’, ‘covid-19’, ‘covid’, ‘sars-cov2’ and
‘corona virus’ between the period of Jan 2020 to July 2020
to ensure we gather all possible questions for our
experiments. We keep only those questions have two or more
answers. In total, we obtained 1880 questions with 11500
answers. We used favorite answers as positive labels
(similar to previous work [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]), assuming that users, over time
rate answers (with upvotes/downvotes) that are most
relevant to the submitted question. We normalized the
question and answer text by removing a small list of stop
words, numbers, links or any symbols. Figure 3a and 3b
show the distribution of question and answer lengths
respectively. Questions contain 12.7 ± 5.8 (qwords) words
and answers consist of 36.3 ± 93.5 (mean± std) words
(awords) respectively which indicates that user submitted
answers can vary widely on Yahoo! answers. On average,
a question has about 6 answers (ans/q) in Yahoo! ans
dataset. We spilt the data into three sets: train (64%, 1196
questions, 7435 answers), validation (16%, 298 questions,
1858 answers) and test (20%, 374 questions, 2310 answers)
7https://answers.search.yahoo.com/search?p=coronavirus
set where questions for each set were uniformly sampled.
Infobot Dataset [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] : Researchers at JHU [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] have
recently compiled a list of user submitted questions on
diferent platforms and manually labeled 22K+
questionanswer pairs. We cleaned this set by removing questions
with less than two answers or no relevant answers. In
total, our dataset contains 8000+ question answer pairs
where each question may have multiple relevant answers
which is not the same as Yahoo! answers dataset. Figure
3c and 3d show the distribution of question and answer
lengths respectively.
        </p>
        <sec id="sec-2-1-1">
          <title>4.1.1. External sources</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We use two external datasets to rank answers. Details of each dataset are given below:</title>
      </sec>
      <sec id="sec-2-3">
        <title>TREC COVID [10]: We use recently released TREC</title>
        <p>
          COVID-19 track data with 50 queries which also contain
manually drafted query descriptions and narratives.
Expert judges have labeled over 5000 scientific documents
for these 50 queries from the CORD-19 dataset 8. These
documents contain coronavirus related research. Given
the documents are scientific literature, we initialize
document embeddings using SPECTER [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>WHO: We use data released on question and answer</title>
        <p>hub of WHO9 website to create a list of question-answer
pairs. There are 147 question and answer pairs in this
dataset where questions contain 13.28± 5.36 words and
answers contain 133.2± 100.9 words respectively.
4.2. Baselines</p>
      </sec>
      <sec id="sec-2-5">
        <title>We evaluated our model against embedding similarity baseline. We computed four baselines as follows:</title>
      </sec>
      <sec id="sec-2-6">
        <title>Random: An answer is chosen at random as relevant for a user question. This is expected to provide a lower bound on retrieval performance.</title>
        <p>Linear Attention (att) : When  = 1.0, our model
defaults to simple linear attention over all the information
present in the external sources. This gives an indication
of how well the model performs when its forced to look
at all the information in the external source.</p>
      </sec>
      <sec id="sec-2-7">
        <title>8https://www.semanticscholar.org/cord19</title>
        <p>9https://www.who.int/emergencies/diseases/novelcoronavirus-2019/question-and-answers-hub
Linear combination ( -sim) : We linearly combine
similarities between Yahoo! question-answer and Trec
query-answer as shown below:
 -sim =   (, ) + (1 −  ) max((, ))

(6)
where ,  and  are Yahoo! answer, question and
concatenated trec query, narrative and description
embeddings respectively. This is a more crude version of
temperature attention where  controls the contribution
of each component directly. We vary  to determine
the optimal combination. Question-Answer similarity
(qasim) is similarity between question and answer
embedding i.e.  = 1. Both question and answer embeddings
are obtained by averaging over their individual token
embeddings.
i.e. ( = 1) in the ranked list is indeed correct. It
is defined as follows:
1 ∑|︁| ∑︀=1 I{ = 1}
|| =1

(7)
where I{ = 1} indicates whether the answer
at position  is relevant to the ℎ question.
• Recall (R@k): Recall at position  evaluates the
fraction of relevant answers retrieved from all
the answers marked relevant for a question. We
report recall averaged for all the queries in test
set. For recall, we take a cutof as ( = 3), which
evaluates whether the model is able to retrieve the
correct answers in top 3 positions. It is defined
as follows:
|| =1</p>
        <p>||
|| ∑︀
1 ∑︁ =1 I{ = 1} (8)
where || is the number of relevant answers
for the th question.
• MRR (MRR): evaluates the average of the
reciprocal ranks corresponding to the most relevant
answer for the questions in test set, which is given
by:
  =</p>
        <p>||
1 ∑︁</p>
        <p>1
|| =1 
(9)
4.3. Evaluation Metrics</p>
      </sec>
      <sec id="sec-2-8">
        <title>We evaluate the performance of our model using three</title>
        <p>
          popular ranking metrics, mainly Precision (P@1), Mean
Reciprocal Rank (MRR), and Recall (R@3). Each metric is
described below:
BERT Q&amp;A (bert) : Large scale pre-trained
transformers [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] are widely popular for NLP tasks. BERT like
models have shown efectiveness on Q&amp;A datasets such
as SQUAD 10. We fine-tune BERT base model with two
diferent answer lengths a) 128 (bsl-128) and b) 256 to- 4.4. Parameter Settings
kens (bsl-256) respectively. The intuition is that large
scale pre-trained models are adept at language
understanding and can be fine-tuned for new tasks with small
number of samples. We finetune BERT for both datasets
Yahoo! ans and Infobot respectively. It is non-trivial to
include external information in BERT and we leave this
for future work.
        </p>
        <p>
          Both primary datasets, Yahoo! ans and Infobot, were
divied into three parts: train (∼ 60%), validation and test
(20%) respectively. The baseline models  -sim and 
are initialized with glove embeddings 11 of 100
dimensions. We performed a parameter sweep over  and 
for  -sim and  -att models with step size of 0.1 between
{0, 1.0} respectively. We used base uncased model for
 implementation. We fine-tuned the model between
1-10 epochs and found that 3 epochs gave the best
result on validation set. We used LSTM with 64 hidden
units to represent question, answer and all the
information in external datasets. We experimented with higher
embedding size and hidden units, but the performance
degraded significantly as the model tends to overfit on
• Precision (P@k): Precision at position  eval- training data. Lastly we used batch size of 64 and trained
uates the fraction of relevant answers retrieved the model for 30 epochs with early stopping.
until position k. For, both datasets Yahoo! ans and
Infobot [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we evaluate whether the top answer
where || indicates the number of queries in the
test set and  is the rank of the first relevant
answer for the ℎ query.
10https://rajpurkar.github.io/SQuAD-explorer/
11https://nlp.stanford.edu/projects/glove/
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
      <p>Category
Entertainment (47)
Health (62)
Politics (143)
Society (38)
Family (20)
efit cQ&amp;A task. Since attention is dependent on the input
query and key embedding lengths, it would be interesting
to scale the computation in our model to incorporate
several open external datasets to overcome this limitation
in the future.</p>
      <p>Yahoo! ans questions are also assigned categories by
users. Category based breakdown of performance on test
set is given in Table 6 and Table 5 respectively, where
categories with largest number of questions in test set are
listed. In all the categories, our model outperforms best
 -sim and qasim model respectively. The largest
improvement happens for questions in Family category where
our model achieves an improvement of 71% over the 
sim model. It seems that ranking answers for questions
from society and politics are harder than other categories.
All the models, however, are able to rank the top answer
in first three positions efectively as Recall @3 is high for
all the categories.
qasim
0.59
0.645
0.587
0.42
0.65</p>
      <sec id="sec-3-1">
        <title>In this work, our focus is to evaluate the utility of</title>
        <p>external information in improving answer ranking for
cQ&amp;A task. Thus, we performed experiments to answer
three main research questions listed below.</p>
        <p>RQ1: Does external information improve answer
ranking?
RQ2: How does temperature ( ) compare with 
parameter?
RQ3: What kind of queries/questions does the model
attend to when ranking relevant/non-relevant answers?</p>
        <sec id="sec-3-1-1">
          <title>RQ2: How does temperature ( ) compare with</title>
          <p>
            parameter? We argued that linearly combining
similarities between question-answer in primary dataset and
between question-external source may not be suficient
to boost performance. We observe that in our results too
i.e.  -sim models do not perform better than ( -att)
modRQ1: Does external information improve answer els. This clearly indicates that more sophisticated models
ranking? We evaluated diferent models for ranking can learn to combine this information directly from
trainanswers in Yahoo! ans and Infobot dataset in presence ing data. However, our experiments indicate that optimal
of TREC and WHO datasets respectively. We found that value of ( ) varies across primary datasets and external
temperature regulated attention models that incorpo- sources of information. For instance, ( -att) model
perrate external sources indeed outperform the baselines formed best when  = 0.4 and  = 0.9 for Yahoo! ans
as shown in Table 4 and Table 3 respectively. ( -att) and Infobot dataset respectively when TREC was used
model beats bert models by ∼ 30% in precision, ∼ 18% as external source. It performed best when  = 0.1 and
in recall and ∼ 16% in MRR respectively on TREC data.  = 0.5 for Yahoo! ans and Infobot dataset respectively
However, ( -att) does only marginally better than att when WHO was used as external source. We also tried to
model in precision and MRR on Infobot data. We sus- vary  beyond 1.0 to determine whether it yielded a trend
pect that is due to the large set of query-document pairs as shown in Table 7. Higher values of temperature seem
in TREC-COVID data compared to fewer number of to degrade model performance. We found that optimal
question-answer pairs in Infobot dataset. Our results temperature range is between [0.1− 1]. Existing research
also clearly suggest that embedding based matching of in model distillation [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ] has also empirically found that
question-answer pair (qasim) would not yield a good lower values of temperature yield better performance.
ranker, though it is better than choosing an answer at We also compared model performance in terms of
prerandom (random). When WHO is used as an external cision when  and  are varied for  -sim models and
dataset, we nfid that (  -att) model is slightly worse than temperature based models respectively as shown in
Figbert. This suggests that not all sources would equally ben- ure 4. Temperature based models peak at one value but
(a) Yahoo!+TREC
(b) Yahoo!+WHO
(c) Infobot + TREC
(d) Infobot + WHO
          </p>
          <p>Src+ Ext
Y! + TREC
Y! + WHO
Ibot + TREC
Ibot + WHO
10
do not have a clear trend indicating that one needs to
explore diferent  values at the time of training for better
performance. On the other hand, we observe that adding
external information also helps the  -sim models until
a certain threshold. Overall, both sets of models show
that free-text external information can be incorporated
to improve answer ranking performance.</p>
          <p>RQ3: What kind of queries/questions does the
model attend to when ranking
relevant/nonrelevant answers? Attention based models have a this external knowledge need not always be structured
very unique feature: they can aid explaining the internal text. However, it is worth noting that curated and
reliworkings of neural network models. We inspect what able external sources may not always be available for all
kind of queries/questions in external datasets does our domains. We addressed a very niche task in this work,
model pay attention to while ranking relevant or non- and further research is required to extend it to
incorporelevant answers. Figure 5 shows one such example of rate multiple external sources. We posit that with
scalYahoo! question and incorporation of TREC data. At the able attention mechanisms, this work can be easily made
time of scoring relevant answer, the model gives higher tractable for large external sources containing thousands
weight to some queries compared to others. In the exam- or millions of entries in the future.
ple, for instance, it assigns more weight to queries
associated with masks or COVID virus response to weather
changes. We observe higher attention weights for ques- 6. Conclusion
tions when relevant answers are ranked than when
nonrelevant answers are scored. An example question, a Question answering platforms provide users with
efecrelevant and non-relevant answer along with model at- tive and easy access to information. These platforms
tention weights on TREC queries are shown from the also provide content on rapidly evolving sensitive topics
Infobot data in Figure 6 respectively. It shows a simi- such as disease outbreaks (such as COVID-19) where it is
lar trend where attention weights are high for external also useful to use external vetted information for ranking
queries that are closely associated with the question an- answers. Existing work only exploits knowledge bases
swer text. which have some limitations that makes it dificult to</p>
          <p>Overall, our experiments show that curated external use them for community Q&amp;A for rapidly evolving
topinformation is useful for improving community ques- ics such as wild-fires or earthquakes. In this work, we
tion answering task. Our experiments also indicate that tried to evaluate the efectiveness of external (free text or
semi-structured) information in improving answer
ranking models. We argue that simple question-answer text
matching may be insuficient and in presence of external
knowledge, but temperature regulated attention models
can distill information better which in turn yields higher
performance. Our proposed model with temperature
regulated attention, when evaluated on two public datasets
showed significant improvements by augmenting
information from two external curated sources of information.</p>
          <p>In future, we aim to expand these experiments to other
categories such as disaster relief and scale the attention
mechanism to include multiple external sources in one
model.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciaramita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>Learning to rank answers on large online qa collections</article-title>
          ,
          <source>in: ACL</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Question/answer matching for cqa system via combining lexical and sequential information</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</source>
          , AAAI'
          <fpage>15</fpage>
          , AAAI Press,
          <year>2015</year>
          , p.
          <fpage>275</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , R.-C. Chen,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Scholer</surname>
          </string-name>
          ,
          <article-title>Beyond factoid qa: Effective methods for non-factoid answer sentence retrieval</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <article-title>A classification-based approach to question answering in discussion boards</article-title>
          ,
          <source>in: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Dalip</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cristo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Calado</surname>
          </string-name>
          ,
          <article-title>Exploiting user feedback to learn to rank answers in qa forums: A case study with stack overflow</article-title>
          ,
          <source>in: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '13,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2013</year>
          , p.
          <fpage>543</fpage>
          -
          <lpage>552</lpage>
          . URL: https://doi.org/10.1145/2484028. 2484072. doi:
          <volume>10</volume>
          .1145/2484028.2484072.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Xu, Multi-modal knowledge-aware hierarchical attention network for explainable medical question answering</article-title>
          ,
          <source>in: Proceedings of the 27th ACM International Conference on Multimedia, MM '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>1089</fpage>
          -
          <lpage>1097</lpage>
          . URL: https://doi.org/10.1145/3343031. 3351033. doi:
          <volume>10</volume>
          .1145/3343031.3351033.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Attentive interactive convolutional matching for community question answering in social multimedia</article-title>
          ,
          <source>in: Proceedings of the 26th ACM International Conference on Multimedia, MM '18</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , p.
          <fpage>456</fpage>
          -
          <lpage>464</lpage>
          . URL: https://doi.org/10.1145/3240508. 3240626. doi:
          <volume>10</volume>
          .1145/3240508.3240626.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Hierarchical graph semantic pooling network for multi-modal community question answer matching</article-title>
          ,
          <source>in: Proceedings of the 27th ACM International Conference on Multimedia, MM '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>1157</fpage>
          -
          <lpage>1165</lpage>
          . URL: https://doi.org/10.1145/3343031. 3350966. doi:
          <volume>10</volume>
          .1145/3343031.3350966.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poliak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fleming</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Costello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yarmohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pandya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Irani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          , et al.,
          <source>Collecting verified covid-19 question answer pairs</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bedrick</surname>
          </string-name>
          , D. DemnerFushman,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Soborof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Trec-covid: Constructing a pandemic information retrieval test collection</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .04474.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rücklé</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Representation learning for answer selection with LSTM-based importance weighting</article-title>
          ,
          <source>in: IWCS 2017 - 12th International Conference on Computational Semantics - Short papers</source>
          ,
          <year>2017</year>
          . URL: https://www.aclweb.org/ anthology/W17-6935.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>End to end long short term memory networks for non-factoid question answering</article-title>
          ,
          <year>2016</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>146</lpage>
          . doi:
          <volume>10</volume>
          .1145/2970398. 2970438.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Answer sequence learning with neural networks for answer selection in community question answering</article-title>
          ,
          <source>arXiv preprint arXiv:1506.06490</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Data-driven answer selection in community qa systems</article-title>
          ,
          <source>IEEE transactions on knowledge and data engineering 29</source>
          (
          <year>2017</year>
          )
          <fpage>1186</fpage>
          -
          <lpage>1198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Severyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <article-title>Structural relationships for large-scale learning of answer re-ranking</article-title>
          ,
          <source>in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>741</fpage>
          -
          <lpage>750</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khabsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Awadallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>Adversarial training for community question answer selection based on multi-scale matching</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>33</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>395</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>From question to text: Question-oriented feature attention for answer selection</article-title>
          ,
          <source>ACM Transactions on Information Systems</source>
          <volume>37</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3233771.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Attentive interactive neural networks for answer selection in community question answering</article-title>
          ,
          <source>in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence</source>
          , AAAI'
          <fpage>17</fpage>
          , AAAI Press,
          <year>2017</year>
          , p.
          <fpage>3525</fpage>
          -
          <lpage>3531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Sun</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Modeling semantic relevance for question-answer pairs in web social communities</article-title>
          ,
          <source>in: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1230</fpage>
          -
          <lpage>1238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kratzwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eigenmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feuerriegel</surname>
          </string-name>
          , Rankqa:
          <article-title>Neural question answering with answer re-ranking</article-title>
          , CoRR abs/
          <year>1906</year>
          .03008 (
          <year>2019</year>
          ). URL: http://arxiv.org/ abs/
          <year>1906</year>
          .03008. arXiv:
          <year>1906</year>
          .03008.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <article-title>Knowledge-aware attentive neural network for ranking question answer pairs</article-title>
          ,
          <source>in: The 41st International ACM SIGIR Conference on Research Development in Information Retrieval</source>
          , SIGIR '18,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , p.
          <fpage>901</fpage>
          -
          <lpage>904</lpage>
          . URL: https://doi.org/ 10.1145/3209978.3210081. doi:
          <volume>10</volume>
          .1145/3209978. 3210081.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoogeveen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Màrquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verspoor</surname>
          </string-name>
          ,
          <article-title>Semeval2017 task 3: Community question answering</article-title>
          , arXiv preprint arXiv:
          <year>1912</year>
          .
          <volume>00730</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Siddique</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Barezi</surname>
          </string-name>
          , P. Fung,
          <string-name>
            <surname>CAiRE-COVID</surname>
          </string-name>
          :
          <article-title>A question answering and multidocument summarization system for covid-19 research</article-title>
          , arXiv
          <year>2005</year>
          .
          <volume>03975</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <article-title>Specter: Document-level representation learning using citation-informed transformers</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2270</fpage>
          -
          <lpage>2282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distilling the knowledge in a neural network</article-title>
          ,
          <year>2015</year>
          . arXiv:
          <volume>1503</volume>
          .
          <fpage>02531</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>