<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Xiv:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>HTW-DIL at Touché: Multimodal Dense Information Retrieval for Arguments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tamás Janusko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aaron Kämpf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denis Keiling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jessica Knick</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Schäfer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maik Thiele</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HTW Dresden</institution>
          ,
          <addr-line>Friedrich-List-Platz 1, Dresden, 01069</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2303</year>
      </pub-date>
      <volume>15343</volume>
      <abstract>
        <p>Retrieving images for arguments poses many of the problems of traditional information retrieval with the added challenge of being inherently multimodal. We adapt a dense retrieval approach to address this issue and acquire synthetic training data to fine-tune a multimodal model as part of our retriever. Furthermore we conduct ablation studies to examine the impact of diferent modalities and benchmark our approach against state-of-the-art methods. While the task itself is ambiguity-laden there appears to be a benefit of using only textual information for retrieving argumentative images.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Multimodal Retrieval</kwd>
        <kwd>Image Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <p>We are provided with 136 arguments consisting of topic, premise, claim, stance and type. The task is to
retrieve supporting images from a web crawl of approx. 9.000 samples where each image is accompanied
by additional information, among others the text content of the encompassing website and the search
query used to obtain that image. A particularly dificult aspect of the task is that several arguments
share a topic and have similar premises and claims while their difering stances are grounded in subtle
deviations of the lines of reasoning.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Multimodal DPR</title>
        <p>As presented in the original DPR paper we also frame retrieval as a metric learning problem aiming
to maximize dot product similarity of matching queries and targets. While DPR retrieves passages
chunks derived from larger documents - we treat each image and the accompanying textual information
(website text summary and web search query) as a unit during training, although only the image is
evaluated eventually.</p>
        <p>
          The method employs an in-batch negative training scheme where for a given image a matching
argument (positive) doubles as a negative example when paired with any other image within the batch,
thus eficiently increasing the data size. Additionally, a randomly sampled argument is passed along
with the positive argument to function as yet another negative for each image in the batch. This way
we yield  positive pairings and 2 negative pairings for a batch size of . On this basis we compute the
negative log-likelihood loss as implemented in the original DPR code1 using the MLLM Moondream22
to facilitate operations in joint embedding space. Phi 1.5 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is the underlying LLM with SigLIP [7]
providing the vision capabilities in Moondream2. This model choice is motivated by the favorable
reported performance as well as its moderate size which is manageable with the hardware available to
us.
        </p>
        <p>We use a learning rate of 1e-5 with linear warm-up from 10% over the first 10% of the training and then
decay back to 10% over the remaining run. With a batch size of 16 training is performed for two epochs
(Ep2), with the exception of an approach with image and text input that we train for three epochs (Ep3)
to probe the onset of possible overfitting.</p>
        <p>The fine-tuned model is then used to embed the query argument as well as the image/text pairs.
FAISS [8] indices are computed for all embedded image/text pairs and the top-k most similar instances
for each argument embedding are retrieved.</p>
        <p>Note that the final evaluation is only based on the retrieved image, rendering the textual website
content as merely supportive context information for the retrieval task which is not directly considered
by the judges.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Pre-Processsing</title>
        <p>Since the input length of any large language model (LLM) is finite we perform several pre-selection and
pre-processing steps. Firstly, we use only the image, website content text and query string to represent
an image and its website context. Additionally, images are scaled to 256 pixels at their largest dimension,
and content text is summarized with BART fine-tuned on CNN Daily Mail 3 for summarization [9].
Inputs too large for summarization models are chunked in suitable sizes and separately condensed
and then re-concatenated. If a website’s content consists mostly of structured text such as lists, no
summarization is performed.</p>
        <p>Since arguments are given in XML-format we join the argument elements and topic into a concise
natural sentence, but without the type information.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Synthetic Train Data</title>
        <p>In order to train a model in the first place we need a training set for the task at hand. Using the
multimodal capabilities of OpenAIs GPT-4 [10] we generate synthetic arguments by inferring plausible
argument elements from available image/website data. Each image/summary is used to derive one
argument topic for which in turn a premise and claim are generated for pro and con stances. The</p>
        <sec id="sec-3-3-1">
          <title>1https://github.com/facebookresearch/DPR/blob/main/dpr/models/biencoder.py#L254 2https://huggingface.co/vikhyatk/moondream2 3https://huggingface.co/facebook/bart-large-cnn</title>
          <p>resulting argument is given in valid XML-format. We do not distinguish between anecdotal and study
types as there were no examples of the latter at the time of development.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Ablative Approaches</title>
        <p>In addition to using image and text data jointly (Moondream Default, Ep2, Ep3) we experiment with
using the images (Moondream Image) and the textual information (Moondream Text) separately in
order to examine whether an unimodal approach represents a feasible alternative within our DPR-like
setup. For this purpose Moondream models are fine-tuned using only images or website content text
(including the query string) with the same hyperparameters as the multimodal approach. Moreover, we
employ Ada-embeddings4 from OpenAI to represent the case of simple text-based retrieval with proven
of-the-shelf technology. The rationale for this is that images found on websites are usually placed there
deliberately by human authors with the intention of supporting the written content. Following that
assumption and given robust text embeddings one can leverage this relation to obtain relevant images
without taking them into account explicitly.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Manual Evaluation Results</title>
      <p>To obtain a better understanding of our approaches we manually evaluate the retrieval results. For
each approach we examine the top-3 retrieved images and assign them to the mutually exclusive
categories supports argument, on-topic and of-topic and compute the inter-annotator agreement
using Cohen’s kappa. Considering all 136 arguments, two annotators and top-3 results we yield
816 annotations per approach. In Figure 1 bars represent number of class instances from all top-3
runs and both annotators, with the bold marking representing the mean and the thinner upper and
lower markings show min/max values found in annotations of single runs. We find the majority of
retrieved images classified as not supportive for the corresponding argument. Only Ada and text-only
Moondream show parity of supporting and of-topic classes with on-topic instances being the majority
which can be interpreted as a trend of low performing image-only approaches to best performing
text-only ones. This may be because only shallow semantics of images are captured by the model. It is</p>
      <sec id="sec-4-1">
        <title>4https://platform.openai.com/docs/models/embeddings</title>
        <p>also supported by kappa values found in Table 1 where we find the highest inter-annotator agreement
for image-only Moondream, our worst performing model. In contrast, the best performing approaches,
Ada and text-only Moondream, have the lowest kappa values. This can be interpreted as evidence of
the inherent ambiguity of the task at hand and the many ways an image can support an argument.
Additionally, we compute the Jaccard index to quantify the similarity of results given by the
diferent methods we employ. For this we use the top-10 results from all arguments and compare the
IDs of the retrieved images. From Figure 2 we take that approaches difer substantially in their choice
of relevant images, with Moondream-based approaches being fairly similar and Ada-based retrieval far
behind with Moondream text-only as the closest approach. The findings from Figure 1 that text-based
approaches difer significantly from the image-only Moondream approach are reafirmed. But while
the high similarity of Moondream image/text fine-tuned for two and three epochs is obvious, the
non-similarity of Ada and text-only Moondream is surprising and again speaks for the high ambiguity
of the challenge.</p>
        <p>Method
Ada
MD Std.</p>
        <p>MD Ep2
MD Ep3
MD Image
MD Text</p>
        <p>Kappa
0.231
0.489
0.439
0.525
0.66
0.371</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work we explored possibilities of multimodal dense image retrieval for arguments. We adapted
the DPR-technique and fine-tuned a multimodal base model for diferent input modalities. Ablation
studies suggest that text-only input is the most favorable input format and fine-tuning on images alone
causes retrieval to stray of-topic. This is underlined by the similarly good results of both text-only
approaches in contrast to their diferences in size and purpose. However, it calls for further examination,
preferably on a larger data set and with additional annotators since 136 samples and two annotators
facilitate only moderately robust statistical analysis.</p>
      <p>The main areas of interest are the contribution of information from text vs. image inputs, as well as the
role and extent of ambiguity when mapping arguments to images. As a natural first step the possibility
of bottlenecks caused by models that are too small and low image resolutions has to be ruled out.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcıa Seco de Herrera</surname>
          </string-name>
          , L. Bloch,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esperança-Rodier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2024:
          <article-title>Multimedia retrieval in medical applications</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          , Ç. Çöltekin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alshomary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Longueville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Erjavec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Handke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mirzakhmedova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Morkevičius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Reitis-Münstermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scharfbillig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stefanovitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of Touché 2024:
          <article-title>Argumentation Systems</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          ,
          <source>Found. Trends Inf. Retr</source>
          . (
          <year>2009</year>
          ). URL: https://doi.org/10.1561/1500000019. doi:
          <volume>10</volume>
          .1561/1500000019.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oguz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-t. Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          ,
          <source>in: Proceedings of the 2020 EMNLP</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>550</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bubeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Giorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gunasekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Textbooks are all you need ii:</article-title>
          <source>phi-1.5 technical report</source>
          ,
          <year>2023</year>
          . URL: https://www.microsoft.com/en-us/research/publication/ textbooks
          <article-title>-are-all-you-need-ii-</article-title>
          <source>phi-1-5-technical-report/.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>