<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Yan);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-Prompt Ensemble Reasoning for MultimodalReasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jianzhong Yan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qida Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junyi Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>MultimodalReasoning is a task focused on multilingual visual question answering (VQA). Given a question image with 3-5 possible answers, the model is required to identify the only correct answer contained in the image.We proposed a multi-prompt Ensemble Reasoning for Multilingual MultimodalReasoning task(MPER).This method designs multiple prompts, calls the GPT-4.1 model(OpenAI's visual-language model available via API as of April 2025) interface in parallel to obtain multiple answers, and then ensembles them to get the final answer.We achieved an accuracy of 0.5994 on the test set without supervised fine-tuning and zero-shot learning, ranking second in the Multilingual list, significantly better than the oficial baseline of 0.2701.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;MultimodalReasoning</kwd>
        <kwd>Zero-Shot Learning</kwd>
        <kwd>Prompt Engineering</kwd>
        <kwd>Ensembling</kwd>
        <kwd>VQA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>method lies in systematically constructing and fusing multiple diversely designed prompts to holistically
stimulate the intrinsic reasoning capabilities of large models from diverse facets and perspectives. This
approach aims to maximize task-specific performance by exploring the combinatorial prompt space.</p>
      <p>Breakthroughs in vision-language pre-training technology provide a new paradigm for addressing
the above challenges. The self-supervised learning framework based on large-scale multimodal data
significantly improves the cross-modal alignment capability of the model: the CLIP model proposed
by Radford et al.[4] achieves semantic mapping of open-domain visual concepts through
comparative learning of 400 million network image-text pairs; the BLIP framework developed by Li et al.[5]
innovatively introduces a noisy data filtering mechanism, and through the collaborative optimization
of synthetic description generation and quality discrimination modules, it sets the best performance
record at the time in multiple tasks reported in ECCV 2022. It is worth noting that the emergence of
general multimodal Transformer models such as GPT-4 indicates that visual reasoning capabilities are
gradually being integrated into the general artificial intelligence system. The human-like performance
of the model in professional tests verifies the gain efect of parameter scale expansion on complex
multimodal reasoning tasks. Under this trend, the Qwen-VL [6] series of models achieves the synergistic
improvement of fine-grained visual understanding and multi-round dialogue capabilities in zero-sample
scenarios through a multi-stage progressive training strategy.</p>
      <p>ImageCLEF-Multimodal Reasoning is a new task first proposed by CLEF 2025[ 7, 8], which aims to
evaluate the generalization ability of VLM in multilingual visual question answering reasoning. Its
challenges include language diversity, complex reasoning chains, and the integration of real-world
knowledge. We propose a multilingual multimodal reasoning method that integrates multiple prompts.
The core of this method is to combine the predictions of multiple prompt-based results to create a
stronger predictive model.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>This paper proposes a zero-shot multi-language reasoning framework based on multi-prompt integration
and ensembling mechanism for the ImageCLEF 2025 multimodal reasoning task. The overall framework
is shown in Figure 1.</p>
      <sec id="sec-2-1">
        <title>2.1. Prompt Template Construction</title>
        <p>To efectively stimulate diverse reasoning pathways within the VLM, this study carefully designed three
diferent types of prompt templates, the Base Prompt, the Chain of Thought (CoT[ 9]) Prompt, and the
Role-playing Prompt,to serve a specific cognitive function. This multifaceted approach is rooted in
high-level prompt engineering principles and aims to maximize task-specific performance by exploring
the combinatorial prompt space[10].</p>
        <p>Base Prompt:
• Function : Provides basic direct instructions that present the question and answer options in a
straightforward manner. These prompts establish a baseline for the model to understand and
respond to the query directly.
• Intended Strength &amp; Scenario : Designed to be most efective for instances where the
image information is unambiguous and the semantic relationship between visual content and
question is straightforward. It primarily relies on the model’s pre-trained knowledge base for
eficient and direct answering.</p>
        <sec id="sec-2-1-1">
          <title>Base Prompt</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Analyze an image containing a multiple-choice question.</title>
          <p>Identifythe question text,all provided answer options (including cases with more than four
choices), and any visuals such as graph sortables.</p>
          <p>Determine the correct answer using only the image content.</p>
          <p>Respond exclusively with the letter of the correct option,without explanation.</p>
          <p>Chain of Thought (CoT) Prompt:
• Function : Explicitly instructs the model to deduce the answer step-by-step (e.g., "think step by
step"). This leverages the CoT technique, which breaks down complex multi-step problems into
intermediate logical steps, enhancing reasoning transparency.
• Intended Strength &amp; Scenario : Particularly suited for images involving complex
visuallinguistic relationships or requiring multi-stage logical deduction. Prior research (e.g., Kojima
et al., 2022[11]; Wei et al., 2022[12]) has demonstrated CoT’s eficacy in significantly enhancing
reasoning capabilities in large models by encouraging explicit, sequential reasoning.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>CoT Prompt</title>
          <p>You are a visual language model (VLM). Follow these exact instructions for analyzing each image
with multiple-choice questions:
1. Identify and transcribe all text accurately (preserve original languages).
2. Clearly identify the question (stem) and list all answer choices in visual order (top to bottom,
left to right).
3. Regardless of how options are originally labeled (e.g., numerals, non-Latin letters, or no labels),
always relabel them sequentially from A to E according to their visual appearance order (1st→A,
2nd→B, 3rd→C, etc.).
4. Analyze and think any visual elements (graphs, charts, diagrams, images) step by step to
determine the correct answer.
5. Output ONLY a single uppercase letter (A-E). Do NOT output original option labels,
explanations, spaces, punctuation, or extra characters.
6. If insuficient information makes you uncertain, still choose and output the most likely option
from A-E.</p>
          <p>Role-playing Prompt:
• Function : Frames the task by instructing the model to adopt a specific persona or expertise
(e.g., "As a master of science exams, please answer..."). This leverages the VLM’s ability to assume
roles and make role-consistent inferences.
• Intended Strength &amp; Scenario : Hypothesized to be beneficial in scenarios involving
linguistic ambiguity, requiring cross-cultural understanding within multilingual settings, or
needing domain-specific knowledge application. By activating diferent knowledge subsets or
reasoning biases through specific personas, this approach promotes "perspective diversity".</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Role-playing Prompt</title>
          <p>As a master of science exams (biology/chemistry/physics/mathematics and other science subjects)
with global dificulty levels,you are taking a multi-language picture-text multiple-choice exam.
Please answer carefully according to the question image:
Task Instructions
1.Please identify the question stem and all the options in the image, and mark them in visual
order as A, B, C, D, E;
2.Please make inferences based on the image information, charts, and formulas, and choose the
most likely correct answer;
3.When outputting, only write the capital letter answer you choose (such as A), without outputting
any explanation.</p>
          <p>Please answer:</p>
          <p>Design Rationale &amp; Synergy: These three distinct prompt types are deliberately designed to stimulate
the model’s reasoning process from complementary perspectives. The Base Prompt ofers eficiency on
clear-cut cases, the CoT Prompt tackles complex multi-step reasoning, and the Role-playing Prompt
addresses ambiguity and leverages contextual/cultural grounding. Their synergistic interaction within
the ensemble framework is posited to enhance the overall robustness and accuracy of the final prediction
by covering a broader spectrum of reasoning challenges inherent in the VQA task.</p>
          <p>All templates support the 13 languages covered by the task to ensure multi-language compatibility.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Result Ensembling</title>
        <p>After the candidate answers are summarized, the final answer final is determined by the majority
ensembling strategy:
final = arg</p>
        <p>3
max ∑︁ I( = )
∈{,,} =1
When there is a tie (e.g. 1:1:1), the answer with a high confidence mark (e.g. "The correct answer is
[A]") is preferred. This heuristic rule uses model self-consistency to resolve ambiguity.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Evaluation</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Evaluation Protocol</title>
        <p>The proposed framework is rigorously evaluated on the ImageCLEF 2025 multimodal reasoning task
using the Exams-V dataset [13]. The dataset contains 4797 validation set samples and 3565 test set
samples, covering 13 languages and involving multidisciplinary knowledge.</p>
        <p>To maintain data authenticity and ensure fair evaluation of zero-shot capabilities, no preprocessing
or augmentation was performed on the input images. For GPT-4.1 inference, images were converted to
Base64 encoded strings and embedded with prompt words according to the OpenAI API specification.</p>
        <p>Accuracy, defined as the ratio of correctly predicted samples to the total sample size, is used as the
core evaluation metric.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Parallel Inference</title>
        <p>Multithreaded API calls are implemented based on Python’s ThreadPoolExecutor, processing three
sets of diferentiated prompts in parallel for each image. When calling the model (GPT-4.1), we use the
default parameters and do not set parameters such as temperature and max_tokens (Note: temperature
controls the randomness of the model output, and max_tokens limits the maximum number of tokens
generated by the model) . This configuration limits the length of the response while ensuring output
determinism. Each set of prompts generates a candidate answer  , forming a set {1, 2, 3}.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Comparative Performance Analysis</title>
        <p>Experiments on the validation set clearly demonstrate the efectiveness of the multi-prompt ensemble
strategy.</p>
        <p>Initially, zero-shot reasoning capabilities were compared under a single-cue demonstration model,
with GPT-4.1 achieving 48% accuracy, significantly outperforming Qwen-VL-Plus (15%). This establishes
GPT-4.1 as a stronger base model.</p>
        <p>After integrating three diferentiated cue templates and adopting a majority ensembling mechanism,
the GPT-4.1 model’s accuracy on the validation set is significantly improved to 61%, an absolute
improvement of 13% over the single-cue approach. Similarly, Qwen-VL-Plus also significantly improved
to 43%, an absolute improvement of 18%.</p>
        <p>These results empirically validate the core hypothesis that combining a diversified cue strategy
with an ensemble approach can significantly improve the reasoning performance of VLMs, even in the
zero-shot setting.</p>
        <p>The table 1 provides direct quantitative evidence of the performance gains achieved by the
multiprompt ensemble strategy. It clearly demonstrates the causal impact of the ensemble approach by
showing the significant accuracy gains from single prompt to multi-prompt plus ensembling for
GPT4.1 and Qwen-VL-Plus. This table is a cornerstone for verifying academic claims such as enhanced
self-consistency and perspective diversity.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Overall Test Set performance</title>
        <p>The positive improvement trend observed on the validation set is maintained on the unseen test set.
Our framework achieves an accuracy of 0.5994, relying entirely on GPT-4.1 + multi-prompt ensemble
solution.</p>
        <p>This performance ranks second overall in the ImageCLEF 2025 multimodal reasoning evaluation,
with an absolute improvement of 32.93% over the oficial baseline of 0.2701. Although there is still
a gap with the championship team’s accuracy of 81.40%, our method shows significant competitive
advantages and robust generalization ability.</p>
        <p>The table 2 directly demonstrates the competitive advantage of the proposed method over the oficial
baseline on the final test set. It quantifies the overall impact of the framework and provides strong
evidence for its practicality and competitive position in real-world benchmarks.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Multilingual Generalization Analysis</title>
        <p>To further evaluate the robustness of the framework with respect to language diversity, this study
presents detailed results for 13 languages in the ImageCLEF 2025 task.</p>
        <p>As shown in Table 3, the performance varies across languages, but generally demonstrates strong
multilingual generalization, with accuracies ranging from 0.3941 (Urdu) to 0.7750 (Bulgarian). This
highlights the framework’s ability to handle challenges such as language diversity, complex reasoning
chains, and integration of real-world knowledge, which are inherent challenges in the ImageCLEF task.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Discussion of Experimental Findings</title>
        <p>Empirical results clearly confirm the significant performance gains achieved by the multi-prompt
ensemble strategy. The significant accuracy improvements observed for GPT-4.1 (13% absolute improvement
on validation set accuracy) and Qwen-VL-Plus (18% absolute improvement on validation set accuracy)
highlight the value of this approach.</p>
        <p>In-depth analysis shows that the multi-prompt ensemble strategy is particularly valuable for reasoning
tasks in complex scenarios. When faced with questions containing semantic ambiguity or long text
descriptions, diferent prompt templates efectively guide the model to focus on information features
at diferent levels of abstraction. For example, in questions involving multi-entity spatial relations,
the ensemble system stably outputs the correct answer through a ensembling mechanism, while the
predictions of a single prompt show large fluctuations.</p>
        <p>
          This improvement directly conrfims Wang et al.’s [ 14] theory on enhancing self-consistency, which
states that introducing diversity in reasoning paths can efectively reduce the probability of accidental
errors. In addition, the findings of this study echo the "perspective diversity" theory recently proposed
by the Trad team [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which argues that diferent prompt templates are equivalent to building a
multidimensional thinking entry point for the model, thereby activating the model’s potential multimodal
reasoning ability.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Limitations and Future Work</title>
      <p>Despite the significant performance improvements achieved by the MPER framework, key limitations
warrant acknowledgment. Primarily, the substantial cost associated with utilizing the commercial
GPT-4.1 API constrained our ability to conduct thorough ablation studies. This limitation prevents a
deeper quantitative analysis of the individual contribution of each prompt type (Base, CoT, Role-playing)
and their various combinations to the overall ensemble performance. Additionally, the current prompt
designs are primarily empirical, and performance variation persists across languages.</p>
      <p>Future work will prioritize conducting comprehensive ablation experiments to rigorously quantify
the impact of each prompt design choice and ensemble strategy component on model performance.
This deeper analysis will provide crucial insights for further refining the approach. Further exploration
of method optimization is also warranted.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper proposes a multi-language multimodal reasoning method based on multi-prompt integration
of GPT-4.1 for the ImageCLEF 2025 MultimodalReasoning task. We compared the performance of the
Qwen-VL-Plus and GPT-4.1 models under single prompt and multi-prompt integration, and finally
selected the GPT-4.1 model and multi-prompt integration strategy to improve the accuracy. Experimental
results show that this strategy has achieved significant gains on both the validation set and the test
set, and finally won the second place in the competition. Our work shows that in complex
multilanguage reasoning tasks, the use of prompt engineering and integration methods can fully tap the
capabilities of VLM. Future work can consider methods such as reinforcement learning, VLM knowledge
fusion[15], knowledge graphs[16], and hybrid integration to further improve the multimodal reasoning
performance of the model.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the Natural Science Platforms and Projects of Guangdong Province Ordinary
Universities (Key Field Special Projects) (No. 2023ZDZX1023)</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used DeepSeek in order to: check and improve
grammar, spelling, and language fluency. After using this tool/service, the author(s) reviewed and edited
the content as needed and take(s) full responsibility for the publication’s content.
[4] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell,
P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from
natural language supervision, in: International Conference on Machine Learning, 2021. URL:
https://api.semanticscholar.org/CorpusID:231591445.
[5] J. Li, D. Li, C. Xiong, S. C. H. Hoi, Blip: Bootstrapping language-image pre-training for unified
vision-language understanding and generation, in: International Conference on Machine Learning,
2022. URL: https://api.semanticscholar.org/CorpusID:246411402.
[6] J. Bai, S. Bai, S. Yang, S. Wang, S. Tan, P. Wang, J. Lin, C. Zhou, J. Zhou, Qwen-vl: A versatile
vision-language model for understanding, localization, text reading, and beyond, 2023. URL:
https://api.semanticscholar.org/CorpusID:261101015.
[7] B. Ionescu, H. Müller, D.-C. Stanciu, A.-G. Andrei, A. Radzhabov, Y. Prokopchuk, Ştefan,
LiviuDaniel, M.-G. Constantin, M. Dogariu, V. Kovalev, H. Damm, J. Rückert, A. Ben Abacha, A. García
Seco de Herrera, C. M. Friedrich, L. Bloch, R. Brüngel, A. Idrissi-Yaghir, H. Schäfer, C. S. Schmidt,
T. M. G. Pakull, B. Bracke, O. Pelka, B. Eryilmaz, H. Becker, W.-W. Yim, N. Codella, R. A. Novoa,
J. Malvehy, D. Dimitrov, R. J. Das, Z. Xie, M. S. Hee, P. Nakov, I. Koychev, S. A. Hicks, S. Gautam,
M. A. Riegler, V. Thambawita, P. Halvorsen, D. Fabre, C. Macaire, B. Lecouteux, D. Schwab,
M. Potthast, M. Heinrich, J. Kiesel, M. Wolter, B. Stein, Overview of imageclef 2025: Multimedia
retrieval in medical, social media and content recommendation applications, in: Experimental
IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 16th International
Conference of the CLEF Association (CLEF 2025), Springer Lecture Notes in Computer Science
LNCS, Madrid, Spain, 2025.
[8] D. Dimitrov, M. S. Hee, Z. Xie, R. Jyoti Das, M. Ahsan, S. Ahmad, N. Paev, I. Koychev, P. Nakov,
Overview of imageclef 2025 – multimodal reasoning, in: CLEF 2025 Working Notes, CEUR
Workshop Proceedings, CEUR-WS.org, Madrid, Spain, 2025.
[9] Y. Wang, S. Wu, Y. Zhang, S. Yan, Z. Liu, J. Luo, H. Fei, Multimodal chain-of-thought reasoning:
A comprehensive survey, ArXiv abs/2503.12605 (2025). URL: https://api.semanticscholar.org/
CorpusID:277065932.
[10] W. Li, X. Wang, W. Li, B. Jin, A survey of automatic prompt engineering: An optimization
perspective, ArXiv abs/2502.11560 (2025). URL: https://api.semanticscholar.org/CorpusID:276408554.
[11] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, Large language models are zero-shot reasoners,</p>
      <p>ArXiv abs/2205.11916 (2022). URL: https://api.semanticscholar.org/CorpusID:249017743.
[12] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. Chi, F. Xia, Q. Le, D. Zhou, Chain of thought
prompting elicits reasoning in large language models, ArXiv abs/2201.11903 (2022). URL: https:
//api.semanticscholar.org/CorpusID:246411621.
[13] R. Das, S. Hristov, H. Li, D. Dimitrov, I. Koychev, P. Nakov, EXAMS-V: A multi-discipline
multilingual multimodal exam benchmark for evaluating vision language models, in: L.-W. Ku,
A. Martins, V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Bangkok, Thailand, 2024, pp. 7768–7791. URL: https://aclanthology.org/2024.acl-long.420/.
doi:10.18653/v1/2024.acl-long.420.
[14] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. H. Chi, D. Zhou, Self-consistency improves
chain of thought reasoning in language models, ArXiv abs/2203.11171 (2022). URL: https:
//api.semanticscholar.org/CorpusID:247595263.
[15] F. Wan, X. Huang, D. Cai, X. Quan, W. Bi, S. Shi, Knowledge fusion of large language models,</p>
      <p>ArXiv abs/2401.10491 (2024). URL: https://api.semanticscholar.org/CorpusID:267061245.
[16] Y. Wang, M. Yasunaga, H. Ren, S. Wada, J. Leskovec, Vqa-gnn: Reasoning with multimodal
knowledge via graph neural networks for visual question answering, in: Proceedings of the
IEEE/CVF International Conference on Computer Vision, 2023, pp. 21582–21592.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Antol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Batra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Zitnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parikh</surname>
          </string-name>
          , Vqa:
          <article-title>Visual question answering</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>2425</fpage>
          -
          <lpage>2433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marvin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hellen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jjingo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nakatumba-Nabende</surname>
          </string-name>
          ,
          <article-title>Prompt engineering in large language models</article-title>
          ,
          <source>in: International conference on data intelligence and cognitive informatics</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>387</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Trad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chehab</surname>
          </string-name>
          ,
          <article-title>To ensemble or not: Assessing majority voting strategies for phishing detection with large language models</article-title>
          ,
          <source>in: ISPR</source>
          ,
          <year>2024</year>
          . URL: https://api.semanticscholar.org/CorpusID: 274436749.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>