<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>good are you? An empirical classification performance comparison of Large Language Models with traditional Open Set Recognition classifiers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Grote</string-name>
          <email>grote@fzi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anuja Hariharan</string-name>
          <email>anuja.hariharan@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Knierim</string-name>
          <email>michael.knierim@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christof Weinhardt</string-name>
          <email>weinhardt@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Large Language Models</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Open Set Recognition</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Classification</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FZI Research Center for Information Technology</institution>
          ,
          <addr-line>Haid-und-Neu-Str. 10-14, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Kaiserstraße 12, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>The release of ChatGPT has led to an unprecedented surge in the popularity of generative AI-based Large Language Models (LLMs) among practitioners. These models have gained traction in business processes due to their ability to receive instructions in natural language. However, they sufer from hallucinations, which are generated texts that are factually incorrect. Hallucinations also arise in text classification tasks, such as customer support ticket classification or intent classification for chatbots. In such scenarios, the user prompts the model to classify an incoming text into predefined categories. Furthermore, in real-world scenarios, it is common to encounter texts that do not fit into the predefined categories. It is unclear if current state-of-the-art LLM can handle such scenarios and how they compare to existing classifiers focusing on these situations. In this paper, we propose a way to evaluate the classification performance of LLMs in an Open Set Recognition (OSR) scenario, where unseen classes can occur at inference time. The simulation consists of an empirical comparison between GPT4 and Gemini Pro, two state-of-the-art language models, a fine-tuned version of GPT3.5 and established OSR classifiers. The results would provide insights into how reliable large language models are for classification purposes and if they can replace existing OSR classifiers that typically require a decent amount of labelled data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Since the release of ChatGPT in November 2022 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the adoption of Large Language Models
(LLMs) in businesses has experienced significant growth [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Especially the ability to use natural
language to interact with these models has allowed practitioners with little programming
knowledge to harness the power of such systems in their daily operations. However, utilising
LLMs comes at the risk of factually incorrect generated texts, also known as hallucinations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Often, these hallucinations are undesired and, for example in the intent classification used in
chatbot interactions, they might negatively impact customer service quality and potentially
harm the company’s reputation. This highlights the need of a robust system that is not only
able to classify the customer intent correctly, but also detects out-of-distribution questions
and replies accordingly. The functionality of a system that rejects out-of-distribution data
points and classifies known patterns into existing categories has been widely studied under
the term Open Set Recognition (OSR) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In particular, deep learning based OSR classifiers,
such as the OpenMax [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or the DOC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] algorithm, have shown an increased performance on
OSR classification tasks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Similarly, the zero- [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and few-shot [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] abilities of LLMs have
also been leveraged to solve theses tasks. Due to the fast-paced advancements in the realm
of LLMs, it is unclear from a practitioner’s point of view how well state-of-the-art LLMs with
zero- and few-shot strategies compare to established solutions from OSR and how reliable they
are in a production setting. This ultimately leads to the question of which approach to choose
and how they compare against each other. In this work, we plan to provide insights into the
classification accuracy and the ability to reject unknown instances by conducting an empirical
analysis between these two research areas. We thereby give guidance for practitioners and an
updated benchmark for the current state-of-the-art LLM classification performance.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Generative Pre-trained Transformer (GPT) models represent a paradigm shift in Natural
Language Processing (NLP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. While these LLMs are typically pretrained in a self-supervised,
task-independent manner, they are known to be very good at NLP tasks, even without
finetuning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To use these models for classification tasks one can either fine-tune the model or
use zero- and few-shot techniques for in-context-learning. Fine-tuning involves adjusting the
weights of a pre-trained model for a particular task and, given a large dataset, supersedes the
classification performance of zero- and few-shot strategies [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In contrast, zero- and few-shot
learning methods utilise the capability of Large Language Models (LLMs) to categorise new
data points efectively, even when they have encountered none or only a minimal number of
examples from a specific class. Typically, zero- and few-shot strategies are combined with
prompting strategies, such as ”Chain of Thought” [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and ”Clue And Reasoning Prompting”
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], to further enhance the classification performance. Despite these strategies, Kocoń et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
and Caruccio et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] have demonstrated that the zero- and few-shot capabilities are worse
than supervised machine learning models for classification tasks. In their analysis, however,
they assumed a closed set scenario, which is an unrealistic assumption.
      </p>
      <p>
        A more realistic scenario than traditional classification is Open Set Recognition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It allows
for unknown classes during inference, and the classifier has an additional option to reject
data points as unknown. If the incoming data point is not rejected as unknown, the classifier
classifies the data point into a known class. Among the first OSR models were adapted Support
Vector Machines [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. With the rise of neural networks, Bendale and Boult [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] reformulated
the final softmax layer to also estimate the probability of a data point being out-of-distribution.
Similarly, Shu et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] use a one-versus-rest classification layer to reduce the misclassifications
in the open space, while Oza and Patel [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilise an autoencoder and its reconstruction loss to
determine if a data point is novel.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed Approach</title>
      <p>To compare the performance of LLMs versus Open Set Recognition classifiers, we plan to set up
an empirical evaluation as illustrated in Figure 1.</p>
      <sec id="sec-4-1">
        <title>Repeat 10 times for each openness scenario</title>
        <p>OSR
LLM</p>
      </sec>
      <sec id="sec-4-2">
        <title>Random Selection</title>
        <p>of KKCs and UUCs</p>
      </sec>
      <sec id="sec-4-3">
        <title>Hyperparametertuning</title>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation on F1 Score</title>
      </sec>
      <sec id="sec-4-5">
        <title>Training</title>
      </sec>
      <sec id="sec-4-6">
        <title>Data</title>
      </sec>
      <sec id="sec-4-7">
        <title>Validation</title>
      </sec>
      <sec id="sec-4-8">
        <title>Data</title>
      </sec>
      <sec id="sec-4-9">
        <title>Test</title>
      </sec>
      <sec id="sec-4-10">
        <title>Data</title>
        <p>
          For our experiments, we will use four diferent text classification datasets. These datasets
include the 20 Newsgroups dataset [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], the Yahoo! Answers dataset [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], the CLINC150 dataset
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and the BANKING77 dataset [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. While the first dataset consists of news articles and the
second questions-answer pairs of certain categories, the last two represent intent classification
tasks. All datasets have at least ten diferent classes or categories, based on which we will
simulate an open set scenario. We follow the data splitting procedure of Geng et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to select
the known and unknown classes for the open set simulation and repeat each simulation ten
times to derive statistically meaningful results. Furthermore, we plan to exclude 0, 10, 20 and
30 % of all available classes from training and evaluate the classification for each scenario on
the f1 score. The f1 score is a commonly used metric in classification problems, measuring the
harmonic mean between precision and recall. However, in OSR scenarios, the unknown classes
are typically not considered as an additional class when calculating the f1 score [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. That is why
we additionally distinguish between the f1 score classification performance on the known and
unknown classes, providing further insights into the applicability of LLMs for open scenarios.
In terms of conversational LLMs, we plan to use two state-of-the-art models, GPT4 [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] from
OpenAI and Gemini Pro [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] from Google, with zero-shot and few-shot prompt configurations.
When using a zero-shot configuration, we provide the LLM with only the category name and
description, while in a few-shot setting, we also include examples of each category. Currently, it
is not possible to create a custom, fine-tuned model from both of these two models. Instead, we
will use OpenAI’s GPT3.5 model and fine-tune it with resources from OpenAI to also investigate
the improvements made through fine-tuning. We then compare the results to the classification
performance of the OpenMax [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and DOC [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] classifiers. To speed up the training process
of both OSR classifiers, we first transform the incoming texts into meaningful embeddings
using the most advanced text embedding provided by OpenAI [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and then train a shallow
neural network on the retrieved embeddings. The shallow neural network integrates either the
OpenMax or the DOC architecture.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>Generative AI models for text generation, like ChatGPT, have proven useful in various tasks.
In particular, they can classify an incoming text into predefined categories. In this paper, we
propose a study design that compares the classification performance of state-of-the-art LLMs
with existing classifiers for Open Set Recognition. The results of this study provide insights into
the reliability of conversational LLMs and whether they are a viable alternative to traditional
classification systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] OpenAI,
          <string-name>
            <surname>Introducing</surname>
            <given-names>ChatGPT</given-names>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://openai.com/blog/chatgpt.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hariri</surname>
          </string-name>
          ,
          <article-title>Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing (</article-title>
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2304.
          <year>02017</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.2304.
          <year>02017</year>
          , publisher: arXiv Version Number:
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          , X. Cheng, W. X.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Halueval: A large-scale hallucination evaluation benchmark for large language models</article-title>
          ,
          <source>in: Proceedings of the 2023 conference on empirical methods in natural language processing</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>6449</fpage>
          -
          <lpage>6464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. de Rezende Rocha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sapkota</surname>
            ,
            <given-names>T. E.</given-names>
          </string-name>
          <string-name>
            <surname>Boult</surname>
          </string-name>
          , Toward Open Set Recognition,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>35</volume>
          (
          <year>2013</year>
          )
          <fpage>1757</fpage>
          -
          <lpage>1772</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2012</year>
          .
          <volume>256</volume>
          , conference Name:
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bendale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          , Towards Open Set Deep Networks,
          <source>in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , IEEE,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA,
          <year>2016</year>
          , pp.
          <fpage>1563</fpage>
          -
          <lpage>1572</lpage>
          . URL: http://ieeexplore.ieee.org/document/7780542/. doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2016</year>
          .
          <volume>173</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          , DOC: Deep Open Classification of Text Documents,
          <source>in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>2911</fpage>
          -
          <lpage>2916</lpage>
          . URL: https: //aclanthology.org/D17-1314. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          - 1314.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Recent Advances in Open Set Recognition: A Survey</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>43</volume>
          (
          <year>2021</year>
          )
          <fpage>3614</fpage>
          -
          <lpage>3631</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TPAMI.
          <year>2020</year>
          .
          <volume>2981604</volume>
          , conference Name:
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Guu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Finetuned language models are zero-shot learners</article-title>
          ,
          <source>in: International conference on learning representations</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/forum?id=gEZrGCozdqR.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Askell, others, Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T.-X. Sun</surname>
            ,
            <given-names>X.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          ,
          <source>Paradigm Shift in Natural Language Processing, Machine Intelligence Research</source>
          <volume>19</volume>
          (
          <year>2022</year>
          )
          <fpage>169</fpage>
          -
          <lpage>183</lpage>
          . URL: https://link.springer.
          <source>com/10.1007/ s11633-022-1331-6</source>
          . doi:
          <volume>10</volume>
          .1007/s11633-022-1331-6.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Kalyan</surname>
          </string-name>
          ,
          <article-title>A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4</article-title>
          ,
          <string-name>
            <given-names>SSRN</given-names>
            <surname>Electronic Journal</surname>
          </string-name>
          (
          <year>2023</year>
          ). URL: https://www.ssrn.com/abstract=4593895. doi:
          <volume>10</volume>
          .2139/ssrn.4593895.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muqeeth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mohta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <article-title>Few-shot parameter-eficient fine-tuning is better and cheaper than in-context learning</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>1950</fpage>
          -
          <lpage>1965</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Zhou, others, Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Wang,
          <source>Text Classification via Large Language Models</source>
          (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2305.08377. doi:
          <volume>10</volume>
          .48550/ARXIV.2305.08377, publisher: arXiv Version Number:
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kocoń</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Cichecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kaszyca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kochanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Szydło</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bielaniewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Janz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanclerz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kocoń</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koptyra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mieleszczenko-Kowszewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Miłkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oleksy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piasecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wojtasik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          , P. Kazienko,
          <article-title>ChatGPT: Jack of all trades, master of none</article-title>
          ,
          <source>Information Fusion</source>
          <volume>99</volume>
          (
          <year>2023</year>
          )
          <article-title>101861</article-title>
          . URL: https://linkinghub. elsevier.com/retrieve/pii/S156625352300177X. doi:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2023</year>
          .
          <volume>101861</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          , G. Polese, G. Solimando,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundaramurthy</surname>
          </string-name>
          , G. Tortora,
          <article-title>Can ChatGPT provide intelligent diagnoses? A comparative study between predictive models and ChatGPT to define a new medical diagnostic bot</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>235</volume>
          (
          <year>2024</year>
          )
          <article-title>121186</article-title>
          . URL: https://linkinghub.elsevier.com/retrieve/pii/S0957417423016883. doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2023</year>
          .
          <volume>121186</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          ,
          <article-title>Probability Models for Open Set Recognition</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>36</volume>
          (
          <year>2014</year>
          )
          <fpage>2317</fpage>
          -
          <lpage>2324</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TPAMI.
          <year>2014</year>
          .
          <volume>2321392</volume>
          , conference Name:
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          <article-title>, Multi-class Open Set Recognition Using Probability of Inclusion</article-title>
          , in: D.
          <string-name>
            <surname>Fleet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Pajdla</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Schiele</surname>
          </string-name>
          , T. Tuytelaars (Eds.),
          <source>Computer Vision - ECCV 2014, Lecture Notes in Computer Science</source>
          , Springer International Publishing, Cham,
          <year>2014</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>409</lpage>
          . doi:https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -10578-9_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Oza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>C2ae: Class conditioned auto-encoder for open-set recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2307</fpage>
          -
          <lpage>2316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , Reuters-21578
          <source>text categorization collection</source>
          ,
          <year>1997</year>
          . Tex.
          <source>howpublished: UCI Machine Learning Repository.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Character-level convolutional networks for text classification</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahendran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Peper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Kummerfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Leach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Laurenzano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mars</surname>
          </string-name>
          ,
          <article-title>An evaluation dataset for intent classification and out-of-scope prediction</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <article-title>Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>1311</fpage>
          -
          <lpage>1316</lpage>
          . URL: https://aclanthology.org/D19-1131. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          - 1131.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Casanueva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Temčinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gerz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Vulić</surname>
          </string-name>
          ,
          <article-title>Eficient intent detection with dual sentence encoders</article-title>
          , in: T.
          <string-name>
            <surname>-H. Wen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Celikyilmaz</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Papangelis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Eric</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>I. Casanueva</given-names>
          </string-name>
          , R. Shah (Eds.),
          <source>Proceedings of the 2nd workshop on natural language processing for conversational AI</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .nlp4convai-
          <fpage>1</fpage>
          .5. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2020</year>
          .nlp4convai-
          <fpage>1</fpage>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24] OpenAI, GPT-4
          <source>Technical Report</source>
          (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2303.08774. doi:
          <volume>10</volume>
          . 48550/ARXIV.2303.08774, publisher: arXiv tex.
          <source>version: 4.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Gemini</surname>
            <given-names>Team</given-names>
          </string-name>
          ,
          <article-title>Gemini: A Family of Highly Capable Multimodal Models (</article-title>
          <year>2023</year>
          ). URL: https:// arxiv.org/abs/2312.11805. doi:
          <volume>10</volume>
          .48550/ARXIV.2312.11805, publisher: arXiv tex.
          <source>version: 1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , New and improved embedding model,
          <year>2022</year>
          . URL: https://openai.com/blog/ new-and
          <article-title>-improved-embedding-model.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>