<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Synthesizing E-Mail Conversations as Part of Knowledge Work Datasets with Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Desiree Heim</string-name>
          <email>desiree.heim@dfki.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Jilek</string-name>
          <email>christian.jilek@dfki.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Ulges</string-name>
          <email>adrian.ulges@hs-rm.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Dengel</string-name>
          <email>andreas.dengel@dfki.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department DCSM, RheinMain University of Applied Sciences</institution>
          ,
          <addr-line>Kurt-Schumacher-Ring 18, 65197 Wiesbaden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Kaiserslautern-Landau (RPTU)</institution>
          ,
          <addr-line>Erwin-Schrödinger-Straße 52, 67663 Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Smart Data and Knowledge Services Department, German Research Center for Artificial Intelligence (DFKI)</institution>
          ,
          <addr-line>Trippstadter Straße 122, 67663 Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data-driven evaluations or optimizations of knowledge work support tools are challenging due to the absence of a generally usable, comprehensive dataset that provides suficient information about the backgrounds of users and their documents. Since data collections sufer from issues like data incompleteness due to data protection measures and lack of thorough annotations, we develop a configurable dataset generator, called KnoWoGen, that simulates collaborative, task-based knowledge work. While in the past a major problem of synthesizing such a dataset was the generation of authentic and diverse documents, the emergence of Large Language Models (LLM) enables it. Hence, in the KnoWoGen, an LLM is prompted to generate task-related documents. Hereby, task configurations include a domain or general topic which is used to randomly generate a more specific subtopic at simulation time to condition the generation of the related document. Additionally, the KnoWoGen stores all available contextual information about the documents and the simulation environment in a knowledge graph. As a proof of concept, we study the generation of e-mail conversations as relevant representatives of knowledge work documents reflecting collaboration. Such threads are particularly dificult to collect in real environments since the involvement of third parties typically hinders their publication and, in laboratory settings, require a substantially higher amount of resources to plan and simulate. In a study conducted to assess the quality of generated e-mail threads, participants rated them regarding their naturalness, coherence, answer quality, and content advances. Overall, two-thirds got the highest or second-highest score on a 5-point scale.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversation generation</kwd>
        <kwd>Knowledge work datasets</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Knowledge graphs</kwd>
        <kwd>Evaluation of knowledge work support tools</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Compared to user studies, data-driven evaluations of knowledge work support tools, like task predictors
or document recommenders, enable comparisons between tools and ofer more objective, reproducible
insights into the tools’ performance and its backgrounds, such as reasons for issues or correct results.</p>
      <p>
        However, as also Gonçalves [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] stated, collecting a comprehensive dataset is challenging not least
because of the required, extensive data annotations that would be necessary to get suficient information
about the users’ and their documents’ background and ground truth data for evaluations. Even if data
collections are annotated with contextual information the annotations might not be suficient for
diferent evaluation use cases. Moreover, issues like data incompleteness due to privacy-,
confidentialityand copyright-preserving, such as censoring and deletion, remain for real-life data collections. From
all publicly available knowledge work datasets published over the years, the more recent RLKWiC
dataset [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides the most background information but is still subject to the aforementioned issues.
      </p>
      <p>
        Because of these disadvantages, we are currently working on a paper elaborating in detail the state
of the art regarding knowledge work datasets and why generating datasets can be advantageous over
collecting data. This motivation also led to our recently proposed knowledge work dataset generator
KnoWoGen [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It simulates configurable scenarios in which multiple knowledge workers complete
tasks, create and utilize documents, and collaborate with others. All documents are generated during
the simulation by prompting a Large Language Model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with task-specific instructions and all relevant
contextual information. For instance, in the configuration, domains are defined for the tasks. At the
simulation time, more fine-granular topics of this domain are generated and one is randomly selected and
given in the prompt to generate the document. The main advantage of the simulation is that all modeled
or controlled background information about the knowledge workers and their documents is known and
can be stored alongside the simulation process. To make the contextual data easily accessible for later
simulation steps or at evaluation time, KnoWoGen stores it in the form of a Knowledge Graph [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In a previous paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we showed that the KnoWoGen can generate authentic documents that
humans cannot reliably distinguish from real documents. While, in that paper, the focus was on single
documents without any interdependencies, in this paper, we concentrate on e-mail conversations.
      </p>
      <p>
        In knowledge work, e-mails are an important type of document. On average, roughly 347 billion
e-mails are sent per day [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in total, and approximately half of them are business e-mails [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However,
when conducting data collections, e-mails are particularly sensitive since typically third parties are
involved which impedes their publication. Alternatively, in laboratory settings, it would be possible
to avoid such issues by requesting the participants to collaborate. Nevertheless, it would require a
substantially higher amount of resources to plan the collection setting and ensure proper collaboration.
Besides, it could be dificult to imitate realistic collaboration processes. Although there exist two
popular, business-focused e-mail datasets, Enron [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Avocado [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], they are unsuitable as knowledge
work benchmarks due to their lack of contextual information about the e-mails and involved persons
including other related documents like text files. Thus, relevant input information or ground truth
data is lacking. To the best of our knowledge, there is also no synthetic dataset containing e-mails and
comprehensive contextual information about them and their environment.
      </p>
      <p>This paper focuses on how our current KnoWoGen prototype generates e-mail conversations. Hereby,
we investigate, in a user study, whether the KnoWoGen can generate threads of high quality regarding
the aspects of naturalness, coherence, answer quality, and content advances. In the paper, we first
introduce the general functionality of the KnoWoGen (Sect. 2), explain how e-mail threads are generated
(Sect. 3), present the aforementioned user study (Sect. 4), and conclude with an outlook on future work
(Sect. 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. KnoWoGen - The Knowledge Work Dataset Generator</title>
      <p>
        The general functionality of the KnoWoGen is shown in Figure 1: First, an engineer of a knowledge
work support tool, who wants to evaluate their tool, specifies the configuration. Subsequently, the
simulation environment with the knowledge workers, their tasks, and other relevant entities like
projects, companies, or products is set up according to the configuration. All information about this
environment is stored in a knowledge graph. In the succeeding simulation steps, tasks are assigned to
agents, and task-correspondent documents are generated by prompting a Large Language Model (LLM)
with task-specific instructions and all other relevant contextual information about involved entities
or related artifacts using a suitable parameterized prompt template. These synthesized documents are
stored in a document base. Again, all contextual information utilized during the simulation is stored in
the knowledge graph. This knowledge graph is built upon an extended version of the PIMO ontology [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
a well-known personal information management ontology. Finally, the generated knowledge work
dataset composed of the document base and the knowledge graph can be used to evaluate tools that, for
instance, predict directly related documents, detect tasks, or classify documents concerning parameters
used in the simulation. More details about the general design of the KnoWoGen can be found in Heim
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Synthesizing E-mail Conversations</title>
      <p>E-mails are specific knowledge work documents. They are also generated by a Large Language Model
(LLM) in the context of specifically defined actions, i.e., substeps of larger tasks, and can depend on
other documents, as explained in Section 2. However, compared to other document types, e-mails are
designed more openly since the rough contents, document dependencies, and other action-specific
characteristics, like how formal the tone should be, are given for the whole e-mail thread and not for
every single e-mail. In the current prototype, these specifications are represented in the prompt to
generate the first e-mail. For the consecutive e-mails, we noticed in pretests that including the entire
previous e-mail, when generating replies, results in unsubstantial answers addressing too many details
of this e-mail. To address this issue, we implemented two mechanisms to get a higher focus on essential
aspects.1</p>
      <p>The first condensation mechanism is summarization. Here, the previous e-mail or thread is
summarized in a few sentences. The following reply generation prompt includes this summarization with the
instruction to answer the previous e-mail. If only this mechanism is used, the group of recipients who
should reply to the e-mail has to be defined externally. Per selected recipient, one reply is generated.</p>
      <p>The other mechanism is question generation. For this mechanism, there are two variants. First, in the
implicit variant, questions are encouraged by including a dedicated instruction in the prompt generating
the initial e-mail that should be answered with respect to these questions. Alternatively, questions can
be generated based on an existing e-mail (explicit variant). In this case, the generated questions are
included in a second e-mail from an initial receiver to the sender who should answer them.</p>
      <p>For the first question generation variant, implicitly included questions are extracted in a structured
list of question-addressee tuples using the Langchain framework’s OutputParsers2. Finally, for each
recipient addressed with questions, a reply is generated with the instruction to answer the respective
questions. In the other variant, the addressee is always the sender of the initial e-mail and, since
questions were produced in a separate step, they are known and do not have to be extracted. Since
questions only address the initial sender, the number of chosen questioners determines the final number
of replies.</p>
      <p>Both condensation mechanisms, the summarization, and the question generation, can be combined.
This is especially meaningful for longer e-mail threads or long e-mails to avoid reaching prompt length
1Examples including prompts, generated e-mails and accompanying knowledge graph excerpts can be found online: https:
//purl.archive.org/knowogen/examples/email_threads
2Langchain is a framework for working with LLMs. See also: https://python.langchain.com/
limits. While only utilizing the summarization mechanism can potentially generate more diverse
answers since it is less focused on specific questions asked, the question-generation process ofers more
background information about the e-mail conversation. Thus, extracted questions with their inquirer,
addressee, the question text, and the replies in which they are answered can be stored in the knowledge
graph. This enriches the dataset with more background information about the e-mails that can also
serve as ground truth in later evaluations targeting, for instance, in which e-mail certain questions
were answered.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>
        Setup. To examine the quality of the generated e-mail conversations, we conducted a user study,
in which participants rated the synthetic single-turn, i.e., an e-mail and its reply, and multi-turn
conversations regarding their naturalness, thread coherence, answer quality, and content advances3 on
a 5-point Likert scale [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Before the experiment, the participants were told that the e-mail threads
had been generated.
      </p>
      <p>
        The conversations have been generated by version 0.2 of the Mistral-7B-Instruct model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We
have chosen this LLM since, at the time of the experiment, conduction, it showed a good ratio between
model size and achieved scores on several benchmarks 4. Moreover, it supported a comparably high
context size of 32k tokens and thus was able to also generate of longer documents.
      </p>
      <p>The generated conversations had various topics. Hence, agents discussed about planning a language
course, organizational questions about a course, strategic planning of job interviews, and planning a
company party. The two single-turn conversations were generated according to the implicit question
generation variant as explained in the previous section. We selected the implicit variant as a
representative of question generation mechanisms because we perceived the questions slightly more natural. The
single-turn conversations were initial e-mails and respectively one reply of one recipient. Similarly,
the KnoWoGen generated the first two e-mails of the two longer conversation chains composed of
four e-mails. For comparison, the next and last two replies were generated with the summarization
mechanism. Again, only one part of the thread involving the sender and one recipient of the initial
e-mail as senders was considered.</p>
      <p>
        The first aspect examined in the experiment was the naturalness of single-turn e-mail conversations.
In an earlier experiment conducted on single documents [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we noticed that participants judged
naturalness often based on social aspects, such as how an author refers to colleagues. Hence, we
asked two separate questions to evaluate how naturalness is judged - on a social and linguistic level.
Additionally, participants were also asked about the coherence of the reply with the earlier e-mail and
how well it addressed the posed questions. For the longer conversation chains, we asked whether the
conversation led to content advances and whether e-mails respect the entire preceding e-mail thread.
      </p>
      <p>In total, 49 participants aged between 18 and 54 with 34 males and 15 females completed the study.
The majority were students, researchers, or software engineers. 43% had a background in Computer
Science. Almost all participants stated that their English language proficiency was B1 or higher. 65% of
the participants used LLMs regularly, 27% occasionally, and the rest had not used LLMs before.
Results. Overall, the participants gave high ratings to the threads’ quality. Figure 2 depicts the score
distribution per question. Especially for the questions about single-turn conversations, the participants
had a high agreement, and respectively 75% gave a score of 4 or 5. The coherence and answer quality
of single-turn conversations achieved the highest score, while the content advances and the answer
quality of the multi-turn interactions received a lower rating and higher variance.</p>
      <p>
        Most comments given for the single-turn conversations addressed the naturalness of the e-mails.
Participants stated, in particular, that some e-mails were too enthusiastic and that the language and
3The generated e-mails and study questions are available here: https://purl.archive.org/knowogen/experiments/email_threads
4We consulted the Huggingface Leaderboard for Open LLMs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which summarizes the performance of a range of LLMs on
several common benchmarks
tone did not fit the social roles of the involved persons. Besides, there were only a few other remarks
stating that one reply did not contain the names of all persons included in the previous e-mail and two
comments about questions that were not properly addressed in the reply. Overall, no comments were
mentioning major issues regarding the coherence of the e-mails or the reply quality. In contrast, there
were several comments for communication chains indicating that from the second reply on, which was
the first one without an explicitly introduced or extracted question, there were barely any advances in
content but rather the content from the first two e-mails was repeated. Moreover, some participants
stated that involved communication partners confused their role in the discourse and answered their
questions or addressed their ideas as if they were proposed by another person.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper gave a brief introduction of our knowledge work dataset generator KnoWoGen, and focused
on how e-mail threads are currently implemented. The user study showed that especially single-turn
e-mail threads, in which replies focused on specific questions from the previous e-mail, were perceived
as natural, coherent, and the response as accurate. However, there were some issues with the identity
of the sender and the summarization-focusing method led to little content advances. In future versions,
prompts of consecutive e-mails could, for example, use an instruction stating that the Large Language
Model should take the role of the sender and make it clearer what the sender and others contributed
to previous e-mails. Moreover, since participants especially perceived replies without a preceding
question as unsubstantial, a dynamic decision of whether e-mails without explicit questions require a
reply could be incorporated. Apart from the mentioned options to improve the e-mail generation even
more, the study indicated that the current KnoWoGen version is already a solid fundament for further
works. In future experiments, since the user study showed that the e-mail are overall of a reasonable
quality, generated e-mail conversations can be additionally evaluated automatically, i.e., by employing,
for instance, an LLM to verify that all questions are answered in a reply or checking the consistency
among e-mails. This would also allow to test various settings or LLMs and compare the quality of more
generated documents without having a high manual efort.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was funded by the German Federal Ministry of Education and Research (BMBF) in the project
SensAI (grant no. 01IW20007).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <article-title>Pseudo-desktop collections and PIM: The missing link</article-title>
          ,
          <source>in: ECIR 2011 workshop on evaluating personal search</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakhshizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jilek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schröder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Maus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <article-title>Data collection of real-life knowledge work in context: The RLKWiC dataset</article-title>
          , in: Information Management, Springer,
          <year>2024</year>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Heim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jilek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ulges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <article-title>Using large language models to generate authentic multiagent knowledge work datasets</article-title>
          ,
          <source>in: INFORMATIK</source>
          <year>2024</year>
          ,
          <article-title>Gesellschaft für Informatik e</article-title>
          .V.,
          <string-name>
            <surname>Bonn</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>1347</fpage>
          -
          <lpage>1357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <source>CoRR abs/2303</source>
          .18223 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. de Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2022</year>
          )
          <volume>71</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>71</lpage>
          :
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>The</given-names>
            <surname>Radicati</surname>
          </string-name>
          <string-name>
            <surname>Group</surname>
          </string-name>
          ,
          <source>Email statistics report, 2023-2027</source>
          ,
          <year>2023</year>
          . URL: https://www.radicati.com/wp/ wp-content/uploads/2023/04/Email-Statistics
          <string-name>
            <surname>-Report-</surname>
          </string-name>
          2023
          <string-name>
            <surname>-2027-</surname>
          </string-name>
          Executive-Summary.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>The</given-names>
            <surname>Radicati</surname>
          </string-name>
          <string-name>
            <surname>Group</surname>
          </string-name>
          ,
          <source>Email statistics report, 2015-2019</source>
          ,
          <year>2015</year>
          . URL: https://www.radicati.com/wp/ wp-content/uploads/2015/03/Email-Statistics
          <string-name>
            <surname>-Report-</surname>
          </string-name>
          2015
          <string-name>
            <surname>-2019-</surname>
          </string-name>
          Executive-Summary.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Klimt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>The enron corpus: A new dataset for email classification research</article-title>
          ,
          <source>in: Machine Learning: ECML 2004, 15th European Conference on Machine Learning, Pisa, Italy, September 20-24</source>
          ,
          <year>2004</year>
          , Proceedings, volume
          <volume>3201</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2004</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Kirsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Golitsynskiy</surname>
          </string-name>
          , Avocado research email collection,
          <year>2015</year>
          . URL: https://catalog.ldc.upenn.edu/LDC2015T03.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sauermann</surname>
          </string-name>
          , L. van
          <string-name>
            <surname>Elst</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Möller</surname>
          </string-name>
          ,
          <article-title>Personal information model (PIMO) ontology v1.3</article-title>
          ,
          <string-name>
            <surname>Online</surname>
          </string-name>
          ,
          <year>2013</year>
          . URL: https://www.semanticdesktop.org/ontologies/2007/11/01/pimo/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Likert</surname>
          </string-name>
          ,
          <article-title>A technique for the measurement of attitudes</article-title>
          ., Archives of psychology (
          <year>1932</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. de las Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            et al.,
            <surname>Mistral</surname>
          </string-name>
          <string-name>
            <surname>7B</surname>
          </string-name>
          ,
          <source>CoRR abs/2310</source>
          .06825 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fourrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lozovskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Szafer</surname>
          </string-name>
          , T. Wolf, Open LLM Leaderboard v2, https:// huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>