<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-label Classification of Factual Incorrectness in Machine-Generated Sum maries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aniket Deroy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhankar Maity</string-name>
          <email>subhankar.ai@kgpian.iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saptarshi Ghosh</string-name>
          <email>saptarshi.ghosh@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IIT Kharagpur</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study addresses the critical issue of factual inaccuracies in machine-generated text summaries, an increasingly prevalent issue in information dissemination. Recognizing the potential of such errors to compromise information reliability, we investigate the nature of factual inconsistencies across machinesummarized content. We introduce a prompt-based classification system that categorizes errors into four distinct types: misrepresentation, inaccurate quantities or measurements, false attribution, and fabrication. The participants are tasked with evaluating a corpus of machine-generated summaries against their original articles. Our methodology employs qualitative judgements to identify the occurrence of factual distortions. The results show that our prompt-based approaches are able to detect the type of errors in the summaries to some extent, although there is scope for improvement in our classification systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine-Generated</kwd>
        <kwd>Multi-label classification</kwd>
        <kwd>Prompting</kwd>
        <kwd>Large language model (LLM)</kwd>
        <kwd>Factual Incorrectness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>In an era where information dissemination is predominantly driven by digital platforms, the
accuracy and integrity of content have become paramount. Machine-generated summaries,
designed to distill complex articles into digestible formats, have gained traction because of
their eficiency and scalability. However, the susceptibility of these systems to introduce
factual errors poses a significant challenge. This research endeavors to meticulously analyze
the prevalence of factual inaccuracies within machine-generated summaries by establishing a
systematic methodology for identification and categorization.</p>
      <p>
        We propose a novel prompt-based framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that empowers participants to discern
and classify factual inaccuracies into one of four distinct types: misrepresentation, inaccurate
quantities or measurements, false attribution, and fabrication. Each category embodies unique
characteristics of factual errors, ranging from subtle misinterpretations to the deliberate creation
of non-existent facts. Misrepresentation refers to the skewed presentation of information that
can alter the perceived meaning. Inaccuracies in quantities or measurements involve numerical
or statistical deviations from the truth. False attribution represents the erroneous association of
CEUR
Workshop
Proceedings
statements or actions with individuals or entities. Lastly, fabrication denotes the most egregious
breach, where information is concocted without any factual foundation.
      </p>
      <p>
        This research serves as a critical investigation into the fidelity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of machine-generated
summaries. By scrutinizing these summaries against their source articles, we aim to quantify the
extent of factual distortions and understand their implications. The ultimate goal is to enhance
the credibility of machine-generated content, ensuring that it serves as a reliable conduit for
knowledge and information in the digital age. Our results show that our novel prompt-based
approaches are capable of detecting the type of errors in the summaries to some extent, although
there is scope for improvement for our classification systems. The task is based on [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] which
are the original track papers.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>In recent years, the field of natural language processing (NLP) has seen a significant shift towards
the development and utilization of large language models (LLMs). These LLMs, particularly
exemplified by OpenAI’s GPT series (Generative Pre-trained Transformer), have revolutionized
various NLP tasks. The foundational concept behind these models involves pre-training on vast
amounts of text data, enabling them to learn intricate language patterns and structures.</p>
      <p>
        Zero-shot prompting with LLMs has been leveraged across various tasks. In text generation,
these LLMs exhibit the capacity to produce coherent and contextually relevant content even
when prompted by unseen topics or styles. Translation tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] benefit from zero-shot
capabilities, allowing language conversion without specific paired training data. Sentiment analysis
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], intent classification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], named entity recognition [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], and multi-label text classification
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are among other tasks where LLMs prompted in a zero-shot manner showcase robust
performance without explicit task-oriented training. Furthermore, question-answering [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
summarization tasks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] witness efective output through the zero-shot prompting approach,
ofering pertinent answers and concise summaries without task-specific fine-tuning.
      </p>
      <p>
        GPT-3.5 Turbo represents a significant advancement in the landscape of LLMs. It builds on
the foundation laid by GPT-3 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], showcasing scale and potential improvement in training
methodologies, although specific details regarding the “Turbo” improvements remain proprietary
to OpenAI. GPT-3.5 Turbo’s training involves self-supervised learning on an extensive and
diverse corpus of Internet text, refining its language understanding and generation capabilities.
Leveraging the zero-shot learning paradigm, it excels in performing various natural language
processing tasks without specific fine-tuning, a hallmark feature carried forward from the
GPT-3 architecture. In the context of detecting factual incorrectness in machine-generated
summaries, the zero-shot prompting method utilizing LLMs presents a promising approach.
The methodology involves instructing the model with label descriptions and tasks, allowing it
to identify and classify factual inaccuracies without direct training on specific datasets. GPT 3.5
Turbo, known for its advanced zero-shot learning capabilities, stands as a potential solution for
discerning factual errors in machine-generated content.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Dataset</title>
    </sec>
    <sec id="sec-5">
      <title>4. Task Definition</title>
      <p>We have been provided with the original articles and incorrect summaries in the training set
and the testing set of ILSUM task 2. There are 8497 articles in the train set and 200 articles in
the test set.</p>
      <p>The task focuses on identifying factual errors in machine-generated summaries. The objective
is to categorize each datapoint into diferent categories based on factual incorrectness in the
summaries.</p>
      <p>Possible types of factual incorrectness:
• Misrepresentation: This involves presenting information in a way that is misleading or
gives a false impression. This could be done by exaggerating certain aspects, understating
others, or twisting facts to fit a particular narrative.
• Inaccurate Quantities or Measurements: Factual incorrectness can occur when precise
quantities, measurements, or statistics are misrepresented, whether by error or intent.
• False Attribution: Incorrectly attributing a statement, idea, or action to a person or
group is another form of factual incorrectness.
• Fabrication: Making up data, sources, or events is a severe form of factual incorrectness.</p>
      <sec id="sec-5-1">
        <title>This involves creating “facts” that have no basis in reality.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Methodology</title>
      <sec id="sec-6-1">
        <title>5.1. Why Prompting?</title>
        <p>Prompting is a valuable approach to solving multilabel classification problems for several
reasons:
– Natural Language Bridge: Prompting allows the use of natural language to bridge the
gap between machine learning models and complex tasks without the need for extensive
reprogramming or model redesign. It essentially converts the classification task into a
text generation problem, which large language models are inherently good at solving.
– Transfer Learning: Through prompting, models that have been trained on vast datasets
can apply their learned knowledge to classify data across multiple labels. This transfer
learning is eficient because it leverages pre-existing knowledge without the need for
extensive additional training on specialized datasets.
– Flexibility: Prompt-based approaches are highly flexible and easily adapted to diferent
tasks and domains. This is particularly useful in multilabel classification, where the
relationships and distinctions between categories can be nuanced and context-dependent.
– Eficiency: Prompting can reduce the need for large annotated datasets that are
typically required to train multi-label classifiers. By using prompts, models can often make
predictions without any examples(Zero-Shot classification).</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Prompting approach</title>
        <p>The prompting approach involved employing the GPT-3.5 Turbo model in zero-shot mode
for the multi-label classification task of detecting factual incorrectness in machine-generated
summaries. The approach included instructing the GPT-3.5 Turbo model in zero-shot mode,
providing a set of label descriptions, and outlining the task to be executed. The hyperparameters
are as follows: temperature = {0.5, 0.6, 0.7, 0.8, 0.9}, max-tokens = 50, and stop = None. A
diagrammatic representation of the model is shown in Figure 1. The prompt we use for the
model is provided in Figure 2.</p>
      </sec>
      <sec id="sec-6-3">
        <title>5.3. Algorithmic approach</title>
        <p>Here we discuss the various prompt based algorithmic approaches that we took to attempt the
problem of multi-label error classification:- 1
– Algorithm 1 tries to prompt the LLM to understand whether the given incorrect summary
belongs to the class misrepresentation. Then the labels fabrication, false_attribution,
and incorrect_quantities are checked, respectively. If we get one predicted label for
a given incorrect summary, we stop the algorithm. The simple heuristic behind first
checking whether the (incorrect summary, original document) pair belongs to class
misrepresentation is the fact that the misrepresentation class occurs in higher proportions
in the training data. The pseudocode of the algorithm is given in Algorithm 1.
– Algorithm 2 tries to prompt the LLM in order to understand whether the given incorrect
summary belongs to the class false_attribution. Then the labels misrepresentation,
fabrication, and incorrect_quantities are checked, respectively. If we get one predicted label
for a given incorrect summary, we stop the algorithm. The simple heuristic behind first
checking whether the (incorrect summary, original document) pair belongs to the
false_attribution class is the fact that the false_attribution class occurs in higher proportions in
the training data. The pseudocode of the algorithm is given in Algorithm 2.
– Algorithm 3 tries to prompt the LLM in order to understand whether the given incorrect
summary belongs to the class misrepresentation. Then, the labels false_attribution,
fabrication, and incorrect_quantities are checked, respectively. If we get two predicted
labels for a given incorrect summary, we stop the algorithm. The simple heuristic behind
ifrst checking whether the (incorrect summary, original document) pair belongs to the
misrepresentation, and false_attribution class is the fact that the misrepresentation, and
false_attribution class occurs in higher proportions in the training data. The pseudocode
of the algorithm is given in Algorithm 3.
– Algorithm 4 tries to prompt the LLM in order to understand whether the given incorrect
summary belongs to the class misrepresentation. Then the labels fabrication,
false_attribution, and incorrect_quantities are checked, respectively. If we get two predicted labels
for a given incorrect summary we stop the algorithm.The simple heuristic behind first
checking whether the (incorrect summary, original document) pair belongs to the
misrepresentation, and fabrication class is the fact that the misrepresentation, and fabrication
class occurs in higher proportions in the training data. The pseudocode of the algorithm
is given in Algorithm 4.
– Algorithm 5 tries to prompt the LLM in order to understand whether the given
incorrect summary belongs to the class false_attribution. Then the labels misrepresentation,
fabrication, and incorrect_quantities are checked, respectively. If we get four predicted
labels for a given incorrect summary, we stop the algorithm. There can be data points
for which we get less than four correct data points. We run GPT-3.5 Turbo at diferent
temperatures 0.5, 0.6, 0.7, 0.8, and 0.9 respectively. Then we take an ensemble of the five
output test runs being run at diferent temperatures by considering all the labels that
1We want to specify that all our results are non-deterministic in nature i.e. the same hyperparameter settings
can lead to diferent results on diferent test runs.</p>
        <p>occurred at least twice for a particular datapoint. The ensembling method that we tried
helped in providing improved accuracy and generalization bringing more stability into
the nature of outputs. The pseudocode of the algorithm is given in Algorithm 5.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Results</title>
      <p>else</p>
      <sec id="sec-7-1">
        <title>Do not perform any action</title>
        <p>end
else</p>
      </sec>
      <sec id="sec-7-2">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-3">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-4">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-5">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class false_attribution then</p>
        <p>Then output the class false_attribution counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then
stop checking for the next label for that datapoint</p>
        <p>Do not perform any action
else
end
else
else
end
else
else
end
else
end</p>
        <p>Algorithm 1: Pseudocode</p>
      </sec>
      <sec id="sec-7-6">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-7">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class incorrect_quantities then</p>
        <p>Then output the class incorrect_quantities counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-8">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-9">
        <title>Do not perform any action</title>
        <p>(  ) ← 0 ;</p>
      </sec>
      <sec id="sec-7-10">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class false_attribution then</p>
        <p>Then output the class false_attribution counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then</p>
        <p>stop checking for the next label for that datapoint
else</p>
      </sec>
      <sec id="sec-7-11">
        <title>Do not perform any action</title>
        <p>end
else</p>
      </sec>
      <sec id="sec-7-12">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-13">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class misrepresentation then</p>
        <p>Then output the class misrepresentation counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-14">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-15">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-16">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class fabrication then</p>
        <p>Then output the class fabrication counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then
stop checking for the next label for that datapoint</p>
        <p>Do not perform any action
else
end
else
else
end
else
else
end
else
end</p>
        <p>Algorithm 2: Pseudocode</p>
      </sec>
      <sec id="sec-7-17">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-18">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class incorrect_quantities then</p>
        <p>Then output the class incorrect_quantities counter(variable) = counter(variable)+1
if counter(variable)==1 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-19">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-20">
        <title>Do not perform any action</title>
        <p>else</p>
      </sec>
      <sec id="sec-7-21">
        <title>Do not perform any action</title>
        <p>end
else</p>
      </sec>
      <sec id="sec-7-22">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-23">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the correponding summary belongs
to class false_attribution then</p>
        <p>Then output the class false_attribution counter(variable) = counter(variable)+1
if counter(variable)==2 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-24">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-25">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-26">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class fabrication then</p>
        <p>Then output the class fabrication counter(variable) = counter(variable)+1
if counter(variable)==2 for a datapoint then
stop checking for the next label for that datapoint</p>
        <p>Do not perform any action
else
end
else
else
end
else
else
end
else
end</p>
        <p>Algorithm 3: Pseudocode</p>
      </sec>
      <sec id="sec-7-27">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-28">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class incorrect_quantities then</p>
        <p>Then output the class incorrect_quantities counter(variable) = counter(variable)+1
if counter(variable)==2 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-29">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-30">
        <title>Do not perform any action</title>
        <p>else</p>
      </sec>
      <sec id="sec-7-31">
        <title>Do not perform any action</title>
        <p>end
else</p>
      </sec>
      <sec id="sec-7-32">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-33">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-34">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-35">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class false_attribution then</p>
        <p>Then output the class false_attribution counter(variable) = counter(variable)+1
if counter(variable)==2 for a datapoint then
stop checking for the next label for that datapoint</p>
        <p>Do not perform any action
else
end
else
else
end
else
else
end
else
end</p>
        <p>Algorithm 4: Pseudocode</p>
      </sec>
      <sec id="sec-7-36">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-37">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the correponding summary belongs
to class incorrect_quantities then</p>
        <p>Then output the class incorrect_quantities counter(variable) = counter(variable)+1
if counter(variable)==2 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-38">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-39">
        <title>Do not perform any action</title>
        <p>(  ) ← 0 ;</p>
      </sec>
      <sec id="sec-7-40">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class misrepresentation then</p>
        <p>Then output the class misrepresentation counter(variable) = counter(variable)+1
if counter(variable)==4 for a datapoint then</p>
        <p>stop checking for the next label for that datapoint
else</p>
      </sec>
      <sec id="sec-7-41">
        <title>Do not perform any action</title>
        <p>end
else</p>
      </sec>
      <sec id="sec-7-42">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-43">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class false_attribution then</p>
        <p>Then output the class false_attribution counter(variable) = counter(variable)+1
if counter(variable)==4 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-44">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-45">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-46">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class fabrication then</p>
        <p>Then output the class fabrication counter(variable) = counter(variable)+1
if counter(variable)==4 for a datapoint then
stop checking for the next label for that datapoint</p>
        <p>Do not perform any action
else
end
else
else
end
else
else
end
else
end</p>
        <p>Algorithm 5: Pseudocode</p>
      </sec>
      <sec id="sec-7-47">
        <title>Do not perform any action</title>
        <p>end</p>
      </sec>
      <sec id="sec-7-48">
        <title>For all data points:</title>
        <p>if prompted LLM to check whether the given article and the corresponding summary belong
to class incorrect_quantities then</p>
        <p>Then output the class incorrect_quantities counter(variable) = counter(variable)+1
if counter(variable)==4 for a datapoint then
stop checking for the next label for that datapoint</p>
      </sec>
      <sec id="sec-7-49">
        <title>Do not perform any action</title>
      </sec>
      <sec id="sec-7-50">
        <title>Do not perform any action</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion and Future Work</title>
      <p>We were given the task of multi-label error classification where we had to classify a document
into four classes namely misrepresentation, fabrication, false_attribution, and
incorrect_quantities. We tried several prompt-based algorithmic approaches for the multi-label error
classification task that we were given as a part of Task 2. We observed that the Algorithm 5 (Ensembling
approach) that we explored obtained the best results. Future work would involve trying a
few-shot technique and trying larger language models such as GPT-4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Openprompt:</surname>
          </string-name>
          <article-title>An open-source framework for prompt-learning</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kryscinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McCann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <article-title>Evaluating the factual consistency of abstractive text summarization</article-title>
          , CoRR abs/
          <year>1910</year>
          .12840 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1910</year>
          .12840. arXiv:
          <year>1910</year>
          .12840.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <article-title>Indian language summarization at fire 2023, in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>FIRE</surname>
          </string-name>
          <year>2023</year>
          , Goa,
          <source>India. December 15-18</source>
          ,
          <year>2023</year>
          , ACM,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <article-title>Key takeaways from the second shared task on indian language summarization</article-title>
          (ilsum
          <year>2023</year>
          ), in: K. Ghosh,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2023 -
          <article-title>Forum for Information Retrieval Evaluation, Goa, India</article-title>
          .
          <source>December 15-18</source>
          ,
          <year>2023</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haddow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Birch</surname>
          </string-name>
          ,
          <article-title>Prompting large language model for machine translation: A case study</article-title>
          ,
          <source>in: Proceedings of the 40th International Conference on Machine Learning, ICML'23</source>
          , JMLR.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nasukawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muraoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <article-title>A simple yet strong domain-agnostic de-bias method for zero-shot sentiment classification</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>3923</fpage>
          -
          <lpage>3931</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>242</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings- acl.242.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tumbade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Vohra</surname>
          </string-name>
          ,
          <article-title>Exploring zero and few-shot techniques for intent classification</article-title>
          , in: S. Sitaram,
          <string-name>
            <given-names>B. Beigman</given-names>
            <surname>Klebanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Williams</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>5</volume>
          :
          <string-name>
            <surname>Industry</surname>
            <given-names>Track)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>744</fpage>
          -
          <lpage>751</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .acl-industry.
          <volume>71</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          . acl- industry.71.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <article-title>Vicunaner: Zero/few-shot named entity recognition using vicuna</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>03253</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Ameer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Zero-shot clinical entity recognition using chatgpt</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>16416</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Label prompt for multi-label text classification</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>53</volume>
          (
          <year>2023</year>
          )
          <fpage>8761</fpage>
          -
          <lpage>8775</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <article-title>Knowledge-augmented language model prompting for zeroshot knowledge graph question answering</article-title>
          , in: E. Hruschka, T. Mitchell,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mladenić</surname>
          </string-name>
          , M. Grobelnik (Eds.),
          <source>Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING</source>
          <year>2023</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, ON, Canada,
          <year>2023</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>98</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          . matching-
          <volume>1</volume>
          .7. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .matching-
          <volume>1</volume>
          .7.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhaskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fabbri</surname>
          </string-name>
          , G. Durrett,
          <article-title>Prompted opinion summarization with GPT-3.5</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>9282</fpage>
          -
          <lpage>9300</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>591</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings- acl.591.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/ paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>