<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1145/383952.383972</article-id>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dimitra Panou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Reczko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Fundamental Biomedical Science, Biomedical Sciences Research Center “Alexander Fleming”</institution>
          ,
          <addr-line>34 Fleming</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <addr-line>Street, 16672 Vari</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>2</volume>
      <fpage>2672</fpage>
      <lpage>2680</lpage>
      <abstract>
        <p>The recently introduced semi-supervised method GANBERT for finetuning large language models [ 1] has been applied for document relevance prediction in biomedical question answering. The additional use of unlabeled texts during training enhances the robustness of the prediction and outperforms our previous transformer ELECTROLBERT [2]. The initial document selection phase used both for ELECTROLBERT and GANBERT has been improved using BM25 combined with RM3 query expansion with optimized parameters. Both systems were continuously improved during the BioASQ11 [3] competition and in the last batch, GANBERT ranked as the 3 team for document prediction. The previous version of ELECTROLBERT took the 1 place for the “yes/no” type questions in this years SYNERGY [4] prediction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>ifne-tuning of BERT with unlabeled data using GAN framework, where a   ()
is trained</p>
      <p>that is trained to distinguish samples of the generator
from the real instances. By generating only the internal representation of text, GANBERT</p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
classification. Two GANBERT variants were later successfully used for predicting he
checkworthiness of potential fake news in tweets [7]. In [8], the noise generation in GANBERT was
optimized for the task of discriminating correct paraphrases of Spanish texts. In the following
we describe optimized document selection and the application of GANBERT for document
relevance prediction in biomedical question answering in the BioASQ11 competition [9]. We
also provide details for the additional predictions with our ELECTROLBERT algorithm [2] in
the same competition.</p>
    </sec>
    <sec id="sec-2">
      <title>2. BM25 and RM3 hyperparameter optimization</title>
      <p>To identify documents relevant for a question, we replace the TF/IDF method with the widely
used BM25 [10]. BM25 has two parameters 1 and  . 1 is intuitively related to the rate
of increase in a document’s score from matching an additional occurrence of a term, where
smaller 1 provides a faster increase. The parameter  controls the extent of document-length
normalisation. The search is combined with RM3 [11], a classic pseudo-relevance feedback
based query expansion model, to find related concepts. RM3 has three parameters,   is
the number of query expansion terms,  is the number of top-ranked documents to obtain
the expansion terms and  defines the weight of the original query. The eficient Python
implementation in the package Pyserini is used [12]. A gridsearch on these parameters to
optimize the mean average precision (  ) of the top 10 returned documents for the BioASQ11
training set provided the values that were used in all four batches of BioASQ11. A random
search optimizing the average   of the top 10 returned documents for the 240 questions in
the first three batches of BioASQ11 indicates potential improvements. The optimized parameters
shown in table 1 clearly outperform the default settings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Training, validation and test data</title>
      <p>For finetuning GANBERT, all pairs of a question and its correct documents provided in the
training set for BioASQ11 are used for the ’relevant’ class. As introduced in the ELECTROLBERT
training [2], the negative examples for the ’non-relevant’ class are generated using a range of
false positives from the initial document selection phase to better discriminate the relevant
documents obtained. All questions of the relevance training set were processed with BM25 and
RM3 using the settings marked with B3+4:EB0-4 in table 1 to select 1000 relevant documents for
each question. The documents were ranked according to their score and all documents between
rank 100 and 150 were used as negative examples, excluding potential positive examples in
these ranks. The values of the start and end rank positions for the negative set were optimized
by retraining and maximizing the mean average precision measured on all batches of BioASQ10.
For the unlabeled set, all pairs of a question and its ideal answer and all related snippets from
the BioASQ10 training set were used. As a validation set, the top 100 documents scored with
BM25 and RM3 (settings again as in B3+4:EB0-4) of the 240 questions in the first three batches
of BioASQ11 was used. A final independent test was made on the 90 questions of batch 4 of
BioASQ11.</p>
      <p>(Original query weight) are parameters of RM3
specifies the number of questions (total 240) with at least one correct document. 
specifies the number of correctly identified documents (max. 647). In the column “used for”, Bx denotes
the BioASQ11 test batch x, and EBy denotes the system ELECTOLBERTy.</p>
      <p>1
1.2
0.4
1.1
0.4
0.30
0.40
0.40
0.30
0.45
0.40
0.60
0.50
0.45
0.45
0.40
0.40
0.55
0.60
0.35
0.35

0.75
0.3
0.0
0.3
0.31
0.31
0.31
0.31
0.36
0.31
0.37
0.33
0.37
0.31
0.38
0.30
0.34
0.34
0.34
0.37

  
123</p>
      <p>10
10
10
17
16
20
20
20
20
20
17
20
15
17
15
18
19
14
18
17
10
10
10
14
16
16
16
16
21
16
16
25
22
20
24
21
23
18
25
26
0.5
0.5
0.5
0.6
0.8
0.7
0.9
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.8
0.7
0.7
0.8
0.7
0.7
4. GANBERT finetuning and hyperparameter optimization
The adaptation of the GANBERT architecture introduced in [1] for document relevance
classiifcation is shown in figure</p>
      <p>1. Using the labeled and unlabeled data described in the previous
section for finetuning and employing the large pretrained BERT model provided with the
GANBERT implementation in the path for the real data (provided by the authors of GANBERT
at https://github.com/crux82/ganbert), all relevant hyperparameters for GANBERT are
optimized by multiple finetunings while monitoring the performance on the first three batches
of BioASQ11 as shown in table 2. All GANBERT models perform substantially better when
compared to the standard BERT model and the performance of GANBERT is quite stable for the
diferent hyperparameter settings, also for variations as suggested in [ 8] in the noise generation
part.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>In table 3 the performances of our document relevance submissions for the BioASQ11
competition are listed. All submissions marked with ’base model’ use the ELECTROLBERT model of
noise
real data
RQ</p>
      <p>R
NRQ</p>
      <p>NR</p>
      <p>U</p>
      <p>F
BERT</p>
      <p>D
relevant
non-relevant
is real?</p>
      <p>4</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and Future Work</title>
      <p>Our suggested GANBERT version for document relevance prediction has shown promising
performance, defeating our previous algorithm ELECTROLBERT. As can be seen at the
published BioASQ11 results, both algorithms perform better than some of the other systems that
seem to employ ChatGPT [15]. One obvious extension would be the replacement of BERT in
the path processing the real data with ELECTROLBERT. This would also lead to the use of
a more appropriate scientific vocabulary, as the BERT model provided with the GANBERT
implementation uses a general purpose vocabulary. It should also be noted that the size of
the unlabeled data set in this study is relatively small due to generation of this using only
text available with the BioASQ datasets and our limited computational resources. One way to
increase this could be the use of random segments from Pubmed abstracts.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>GPU computations were ofered by HYPATIA, the Cloud infrastructure of the Greek ELIXIR
node.</p>
      <p>[1] D. Croce, G. Castellucci, R. Basili, GAN-BERT: Generative Adversarial Learning for Robust
Text Classification with a Bunch of Labeled Examples, in: Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics, Association for Computational
Linguistics, Online, 2020, pp. 2114–2119. URL: https://aclanthology.org/2020.acl-main.191.
doi:10.18653/v1/2020.acl- main.191.
[2] M. Reczko, ELECTROLBERT: Combining Replaced Token Detection and Sentence
Order Prediction, in: Proc. of CLEF 2022: Conference and Labs of the Evaluation Forum,
September 5–8, 2022, Bologna, Italy, online http://ceur-ws.org/Vol-3180/paper-24.pdf,
urn:nbn:de:0074-3180-7, 2022.
[3] A. Nentidis, G. Katsimpras, A. Krithara, S. Lima-López, E. Farré-Maduell, L. Gasco,
M. Krallinger, G. Paliouras, Overview of BioASQ 2023: The eleventh BioASQ challenge on
Large-Scale Biomedical Semantic Indexing and Question Answering, in: Experimental
IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth
International Conference of the CLEF Association (CLEF 2023), 2023.
[4] A. Nentidis, G. Katsimpras, A. Krithara, G. Paliouras, Overview of BioASQ Tasks 11b and
Synergy11 in CLEF2023, in: Working Notes of CLEF 2023 - Conference and Labs of the
Evaluation Forum, 2023.
[5] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, 2018. URL: https://arxiv.org/abs/1810.04805.
doi:10.48550/ARXIV.1810.04805.
[6] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
Y. Bengio, Generative Adversarial Nets, in: Proceedings of the 27th International
Confer</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>