<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ML, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Language Models in Cybersecurity: A Comparative Approach to Task-Driven Model Assessment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Holger Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Klaus Kaiser</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Spiekermann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dortmund University of Applied Sciences and Arts</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>26</volume>
      <issue>2025</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Large language models (LLMs) have demonstrated impressive general-purpose capabilities across a wide range of computational tasks. However, their substantial resource demands and integration constraints raise critical concerns for deployment in security-sensitive scenarios. In response, small language models (SLMs) and tiny language models (TLMs) have gained attention as lightweight, adaptable alternatives, especially when operational context and task requirements are well understood. This paper provides a task-driven approach to language model (LM) assessment, emphasizing that the largest LM is not always the optimal choice. We perform a systematic analysis of representative tasks from cybersecurity, i.e., especially in the fields of secure software development and digital forensics, and extract key technical and operational characteristics. By mapping these characteristics profiles to properties of diferent LM classes, we identify practical scenarios where SLMs or TLMs are not only suficient but preferable.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Language Model</kwd>
        <kwd>Cybersecurity</kwd>
        <kwd>Secure Software Development Lifecycle</kwd>
        <kwd>Model Assessment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid proliferation of artificial intelligence and natural language processing has revolutionized
software and system development, with large language models (LLMs) becoming essential tools for
various tasks such as logical reasoning, summarization, code generation, and automated decision
support [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although LLMs are increasingly used in cybersecurity for secure software development [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and digital forensics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], supporting tasks such as secure coding and reasoning over forensic artifacts,
their strong generalization capabilities and performance in diverse scenarios make them inherently
more aligned with the role of generalists.
      </p>
      <p>
        However, this general-purpose strength results in diferent challenges. LLMs are resource-intensive,
with substantial demands on memory, compute, and energy consumption, creating significant limitations
to integration in constrained environments such as edge devices, embedded systems, or security-critical
infrastructures. Additionally, concerns about data leakage, lack of transparency, update latency, and
limited control raise critical questions regarding their suitability for regulated or mission-critical
domains [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        These challenges led to the focus on small language models (SLMs) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or even tiny language models
(TLMs) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While TLMs and SLMs cannot match the general-purpose performance of LLMs, they
often ofer improved characteristics in terms of eficiency and deployment flexibility. Technically,
SLMs achieve their eficiency through reduced layer depth, narrower hidden dimensions, and careful
optimization of attention mechanisms. Although this reduced capacity limits their generalization range,
SLMs and TLMs can be easier fine-tuned on domain-specific tasks - sometimes even outperforming LLMs
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Furthermore, the lower complexity of SLMs facilitates better runtime control, faster adaptation
cycles, and cost reduction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The main contributions of this paper are:
• An analysis based on recent research and practices examining characteristics and potential support
of LMs for selected cybersecurity tasks. The latter cover all phases of a state-of-the-art secure
software development lifecycle (SSDL), including in particular secure software development and
digital forensics.
• A task-driven assessment and technical comparison of LLMs, SLMs, and TLMs based on their
suitability for addressing the analyzed cybersecurity tasks’ characteristics, with a specific focus
on resource-aware and practical deployment.</p>
      <p>The remainder is structured as follows. Section 2 lists relevant work and research in the field of LMs
and related topics. Section 3 outlines seven representative cybersecurity tasks, envisages potential LM
support, and presents tasks’ characteristics. Section 4 defines critical properties related to LMs and
evaluates how well the cybersecurity tasks’ characteristics are satisfied by diferent LMs. Section 5
maps the tasks to LMs by comparing the tasks’ characteristics with the LM properties to identify the
most suitable LM. We conclude this paper in Section 6 and give an outlook to our future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In the context of cybersecurity and SSDL, LM research primarily focuses on code-centric tasks such
as secure code generation and vulnerability detection but also on digital forensics. Accordingly, we
discuss a selection of corresponding works. In the field of secure software development, using LLMs
for secure code generation is explored [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], a systematic literature review on code security is
conducted [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and SecureBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], an example of a domain-specific LM, is developed. Moreover, LLMs
supporting automated vulnerability detection [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] are developed. In the field of digital forensics,
the application of the ChatGPT LLM is explored [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], a systematic literature review is conducted
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and ForensicLLM [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], an example of a domain-specific LLM, is developed. In contrast to the
aforementioned works, our approach is broader in that we not only consider code-centric cybersecurity
tasks, but also other tasks such as threat modeling and analysis. As explained above, although there are
domain-specific LMs in the field of cybersecurity, most recent research focuses on LLMs.
      </p>
      <p>
        As far as we know, the task-driven LM assessment approach presented in this paper is currently
the only one of its kind. Nevertheless, we consider benchmarks that evaluate LMs in the context of
cybersecurity as related work. In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], LLMs are evaluated regarding their ability to refute security and
privacy misconceptions. SALLM [18], LLMSecEval [19] and CyberSecEval1 are example benchmarks for
LLMs to evaluate secure coding abilities. Similarly to the general LM works in cybersecurity discussed
above and in contrast to our work, the benchmark works focus on LLMs and code-centric tasks.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Cybersecurity Tasks and their Characteristics</title>
      <p>The suitability of an LM for a given scenario cannot be assessed solely based on its raw capabilities.
Instead, a thorough examination of multiple critical requirements that reflect functional, technical,
and operational constraints specific to the intended application context is necessary. As we focus on
cybersecurity, especially on the topics of secure software development and digital forensics, we analyze
in Section 3.1 representative cybersecurity tasks to establish a foundation for assessing the suitability
of LMs in Section 3.2.</p>
      <sec id="sec-3-1">
        <title>3.1. Cybersecurity Tasks</title>
        <p>There are several best practices in the area of secure software development such as Microsoft Security
Development Lifecycle (SDL)2, OWASP Software Assurance Maturity Model (SAMM)3, NIST Secure Software
1https://meta-llama.github.io/PurpleLlama/CyberSecEval
2https://www.microsoft.com/en-us/securityengineering/sdl
3https://owaspsamm.org
Development Framework (SSDF)4, and also standards such as IEC 62443-4-15. Based on an analysis and
comparison of the aforementioned best practices and standards using SAMMY 6, we derive a base set of
business functions for a state-of-the-art SSDL. For each business function, common security practices as
advocated by the best practices and standards are selected. These security practices are implemented by
specific security tasks. For instance, anomaly detection is a task typically performed as part of the security
practice of incident detection and response, which is integral to the business function of operations. In the
following, we present a selection of tasks that are representative in the sense that they cover both, early
(requirements, design) and late phases (implementation, verification, operation) of the SSDL. Moreover,
this selection includes tasks that are currently supported by LMs (as outlined in Section 2) as well as
those lacking such support. Each task is described in a concise, profile-like format and is accompanied
by a discussion of potential support through LMs as well as a presentation of the task’s requirements
(potentially constraining LM suitability).</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Threat Modeling &amp; Analysis</title>
          <p>Threat modeling and analysis [20] allows to identify weaknesses at diferent early stages of SSDL, prior
to their integration into the system through implementation or deployment. Threat modeling and
analysis typically follows a systematic methodology such as STRIDE [21] and PASTA [22]. The methods
include system modeling techniques (e.g., based on data flow diagrams) to establish the modeling
context and creativity approaches (e.g., brainstorming, serious gaming) to explore system models with
the aim of discovering weaknesses.</p>
          <p>Potential LM support: LMs can support threat modeling and analysis by automatically generating
system models based on, e.g., given artifacts or observations of a human developer. Moreover, LMs
can act as companions to human developers, guiding creativity approaches to identify weaknesses in
systems.</p>
          <p>Requirements: Input artifacts and outputs, e.g., system and threat models, can contain internal
information to be kept confidential and personal data. Since threat models are an essential foundation
for software and system development, they need to be particularly reliable. Threat models need to be
updated from time to time, e.g., whenever there are system changes.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Secure Coding</title>
          <p>Secure coding is the practice of writing source code with the intention of minimizing or eliminating
vulnerabilities. This task is typically driven by basic tenets such as Saltzer and Schroeder’s design
principles [23] as well as the use of secure coding guidelines, such as Oracle’s guidelines for Java7 and
checklists as provided by OWASP cheat sheet series8.</p>
          <p>Potential LM support: LMs can enhance secure coding practices by automatically generating
code snippets based on provided specifications or by refining code according to a developer’s input.
Additionally, LMs can serve as assistants to developers, providing guidance on best practices and
identifying potential vulnerabilities in code.</p>
          <p>Requirements: For use while programming, source code has to be generated immediately. Source
code is specific to programming languages, frameworks, and libraries. It is often seen as intellectual
property and can also contain personal data. Since source code plays a central role in software and
4https://csrc.nist.gov/Projects/ssdf
5https://webstore.iec.ch/en/publication/33615
6https://sammy.codific.com
7https://www.oracle.com/technetwork/java/seccodeguide-139067.html
8https://cheatsheetseries.owasp.org
system development, it must be particularly reliable. In particular, latest secure coding guidelines must
be followed, and known vulnerabilities should be avoided. The secure coding task must be continuously
applied throughout the entire SSDL.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Secure Code Review</title>
          <p>Modern code review (MCR) [24] is a developer-centric process that aims to identify source code defects.
MCR consists of two phases: planning and setup, and the actual review. The author prepares a review
package and notifies the chosen reviewers. These analyze the code, interact with the author, and decide
on acceptance, rejection, or rework. MCR is integrated with systems like Git9 and supported by tools
such as Gerrit10, focusing on quick, small code changes. Regarding cybersecurity, MCR is crucial for
vulnerability detection [25], using secure coding guidelines and checklists, as seen in OWASP’s code
review guide11 which refers to this process as secure code review.</p>
          <p>Possible LM support: LMs can support secure code review, either impersonating the reviewer role
completely, or serving as a companion to a human reviewer. LMs can support code checking, on the
one hand, by automatically detecting potential weaknesses, the maliciousness of which must then
be manually confirmed or refuted, and, on the other hand by methodically accompanying a reviewer
during manual code checking.</p>
          <p>Requirements: Essentially, the requirements are similar to those of the previously discussed task of
secure coding, with the diference that secure code reviews are not conducted continuously, but only
when review packages must be processed.</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.1.4. Static Application Security Testing</title>
          <p>Static application security testing (SAST) [26] involves analyzing application source code for
vulnerabilities without executing the code. SAST, as a white-box approach, integrates with secure coding
and secure code review practices, and is typically supported by tools that involve a comprehensive
examination of the internal code structure. These tools automate the detection of vulnerabilities by
applying a set of predefined rules and patterns, which are continuously updated to reflect the latest
security threats.</p>
          <p>Potential LM support: LMs might overcome traditional rule-based SAST approaches and allow
detecting complex source code anomalies. LMs can ofer natural language explanations and
recommendations for remediation, aiding developers in understanding and fixing identified vulnerabilities.
Requirements: Since we focus here on those SAST tools that require compilable code, we consider
the use of SAST rather during secure code review than during secure coding. However, the requirements
are similar to those of the previously discussed task of secure coding except the following: Usage does
not require any specific timing constraints and occurs only intermittently. Because of the compilable
code, input size is rather large. SAST results in terms of identified vulnerabilities have to be kept
confidential until security measures are in place.</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>3.1.5. Penetration Testing</title>
          <p>Penetration testing [27] comprises simulating real-world attacks on an application, network, or system
to identify and exploit vulnerabilities in a controlled environment. Typically mimicking the perspective
and techniques of potential attackers, penetration testing makes use of several diferent tools, including,
9https://git-scm.com
10https://www.gerritcodereview.com
11https://owasp.org/www-project-code-review-guide
for example, SAST tools. Penetration testing follows a methodology, e.g., Penetration Testing Execution
Standard (PTES)12 and is generally employed in late SSDL phases such as verification and operation.
Potential LM support: LMs can assist human penetration testers by generating realistic attack
vectors, developing exploits and providing insights on complex security environments.
Requirements: In penetration testing, there is typically no particular time pressure. The artifacts
processed may include both personal data and internal information. Penetration testing reports have
to be kept confidential until risks have been reduced. Similar to secure coding, the latest known best
practices, standards, and specific vulnerabilities must be taken into account. Penetration testing is one
of the last lines of defense, so reliability is important. Similar to anomaly detection, the input data is
typically extensive.</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>3.1.6. Incident Response</title>
          <p>Incident response refers to the structured process of detecting, analyzing, mitigating, and recovering
from security incidents such as intrusions, data breaches, or malware outbreaks [28]. It is a time-critical
and information-intensive task that often requires automated support for log analysis, correlation of
indicators, threat classification, and response decision-making [ 29]. The goal is to minimize impact,
contain threats, and ensure rapid recovery while maintaining auditability and compliance.
Potential LM support: LMs can improve the incident response process by facilitating the automated
analysis of large data-sets, quickly identifying patterns and anomalies. They can assist in threat
classification by evaluating and interpreting indicators in real-time, leading to a faster understanding of
incidents. Additionally, LMs can provide decision support by deriving insights from historical data and
suggesting targeted response actions to optimize threat containment and recovery eforts.
Requirements: Incident response tasks are performed if suspicious behaviour in IT infrastructures
emerge. In this case, the investigation of personal or company-internal information comes into play,
leading to privacy-related restrictions. Depending on the criticality of the incident, the investigation
process has to be accelerated in order to retrieve results about the attack timely.</p>
        </sec>
        <sec id="sec-3-1-7">
          <title>3.1.7. Anomaly Detection</title>
          <p>Anomaly detection involves identifying deviations from expected patterns in data, which may indicate
misconfigurations, intrusions, system failures, or novel attack behavior [ 30]. It is commonly used in
network monitoring [31], system behavior analysis, and threat hunting. Depending on the deployment
context, it can run continuously (e.g., in real-time monitoring) or batch-wise (e.g., in ofline forensics).
Potential LM support: LMs can significantly aid anomaly detection by analyzing vast streams of
data to identify deviations from established patterns in real-time. By learning from historical data,
they can more accurately diferentiate between normal fluctuations and genuine anomalies, reducing
false positives. Additionally, LMs can provide contextual insights into detected anomalies, helping to
prioritize response eforts and refine detection algorithms over time.</p>
          <p>Requirements: The process of anomaly detection is typically faced with large-scale data sources
like network trafic, system logs, financial information and health monitoring data. Most of these
information are privacy-related, volatile and dynamic. Depending on this, LMs must operate under
strict latency constraints, restrict external access and support continuous update cycles to ensure the
ongoing validity of the extracted results.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Characteristics</title>
        <p>Since the requirements of cybersecurity tasks as presented in the previous Section 3.1 overlap, we have
identified and compiled these common requirements and defined corresponding characteristics. E.g., the
requirement that personal data is typically processed applies to all tasks, as outlined in the requirements
paragraphs of the task descriptions in the previous section. Therefore, we have defined personal data as
a characteristic. Below, all characteristics defined following this approach are listed and explained.
Real-time usage: Defines the necessity for immediate responses, demanding delivery with minimal
latency, often under strict timing constraints.</p>
        <p>Intellectual property: For a task involving proprietary knowledge, source code, or domain-specific
algorithms, the protection of intellectual property must be ensured.</p>
        <p>Personal data: For a task that entails user-specific or personally identifiable information, which are
governed by legal and ethical constraints, the protection of personal data is essential.
Up-to-dateness: Emphasizes that a task requires access to the latest information, such as threat
intelligence, dynamic content, or time-sensitive rules, highlighting the importance of incorporating
or adapting to current data.</p>
        <p>Tech constraints: Describes task dependencies on infrastructure, programming languages,
frameworks, or libraries, where strong vendor lock-in or limited deployment portability can hinder
adoption in certain environments.</p>
        <p>Trustworthiness: Highlights the trust needed for a task regarding the robustness, predictability, and
verifiability of its outputs, crucial in contexts where systems must handle edge conditions, resist
adversarial inputs, and maintain consistency.</p>
        <p>Continuous usage frequency: Describes if a task is typically performed continuously, i.e., very often.
Large input size: Describes if a task requires a large amount of input data in prompt.</p>
        <p>In Table 1, we align the characteristics with the tasks. This way, characteristics profiles for the selected
tasks emerge, forming constraints that are particularly critical to the assessment of LM suitability. In
Table 1, a hook (✓) means the characteristic in the corresponding column is relevant for the task in
the corresponding row, while cross (✗) means the characteristic is not relevant for the task. Due to the
way the characteristics are defined - directly derived from the task requirements - the justifications for
marking crosses or hooks become immediately clear.</p>
        <p>In the next sections, we discuss to what extent diferent LMs can address the characteristics and
ultimately which LMs are suitable for which task and which are not.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Task-driven LM Assessment</title>
      <p>We present definitions for LLM, SLM and TLM in Section 4.1 and we consider in Section 4.2 LM
properties such as size, deployment and usage to derive LM profiles in Section 4.3.</p>
      <sec id="sec-4-1">
        <title>4.1. Definitions</title>
        <p>As stated before, we want to consider not only LLMs but also SLMs and TLMs. Unfortunately, there is no
clear definition for these classes. For our case, we diferentiate LMs based on the hardware requirements,
which are needed to operate them. An identifier for this is the number of parameters - which defines the
amount of VRAM (graphics card memory) and operations needed for inference. Therefore we define:
LLMs: Models being so large, that specialized hardware is needed (like NVIDIA H100 graphic cards).
SLMs: Models being smaller than LLMs such that they can be run on consumer hardware (like NVIDIA</p>
        <p>RTX 4070 graphics card).</p>
        <p>TLMs: Models being so small, that no dedicated hardware is needed (without dedicated graphics card).</p>
        <p>To get a glimpse on the diferences of the model sizes, we take a look on the Qwen1.5 13 model, which
was developed by the Alibaba group and is provided in diferent sizes ranging from 0.5B up to 110B 14
parameters. In Table 2, the performance of some of these models with their corresponding classification
are given [32, 33].</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Diferentiation of LMs</title>
        <p>Many LLMs are only available in cloud, by an external provider. If the model is publicly available, it
can also be deployed locally on machines with a suitable GPU. Alternatively, they can also be deployed
on a machine only equipped with a CPU, resulting in slower inference time making LLMs unusable
and SLMs hardly usable. We diferentiate LMs concerning deployment, which directly afects data
protection (cloud vs. local) and speed (cloud vs. GPU vs. CPU).</p>
        <p>Next to the size of the models and the deployment, it is useful to difer how an LM is used to solve
the task. In general an LM creates a prediction based on a given prompt. The simplest way is to directly
use the resulting prediction in the following process. In this case, one can difer between the usage of a
pre-trained model, i.e., an LM without further improvement, or a fine-tuned model, i.e., a model which
is tuned on a specific task with an additional training phase. A diferent way is retrieval-augmented
generation (RAG): The model only serves as an interface between task and solution database. The model
reformulates the task into a query, which can then be evaluated externally. Afterwards, the result is
given back to the model to generate the final answer [ 35]. A pre-trained model can be used directly,
without further improvements, a fine-tuned model can be more accurate in solving tasks and in RAG
new information can be directly added.
13https://qwenlm.github.io/blog/qwen1.5
14https://qwenlm.github.io/blog/qwen1.5-110b</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. LM Assessment based on Characteristics</title>
        <p>In Section 3, we derived characteristics relevant for diferent cybersecurity tasks. Next, we check to
what extent the diferent classes of LMs fulfill these characteristics. For this, we consider the following
groups of LMs:
• LLMs deployed in cloud or GPU and used pre-trained, fine-tuned or in RAG
• SLMs deployed in cloud, GPU or CPU and used pre-trained, fine-tuned or in RAG
• TLMs deployed in GPU or CPU and used pre-trained, fine-tuned or in RAG
We summarize our results in Table 3, where a hook (✓) means the given LM class is suitable to fulfill
this characteristic, while cross (✗) means that the given LM class is less preferable for this characteristic.</p>
        <p>Note that for most characteristics, there is no strict separation whether an LM fulfills it or not. Thus,
we give an impression based on our experience from current practice and the state of the art. Depending
on a specific task and concrete LM, the assessment could be completely diferent.</p>
        <p>It is important to note that we do not compare LMs in terms of accuracy for a cybersecurity task, as
such comparison are intended to be conducted only after selecting a concrete model.
Real-time usage: Generally, a large model is slower than a small model given the same hardware. For
real-time usage, an LLM in the cloud can only be used in a limited way - since delay due to the
data transfer is added. TLMs can be used on proper local hardware in a quite fast way. Therefore,
only GPU-deployed TLM is marked with a hook, while all others are marked with a cross.
Intellectual property and personal data: To ensure the protection of intellectual property and
personal data, the model should ideally be deployed locally, e.g., within the application’s domain.</p>
        <p>Therefore, cloud-deployed LMs are marked with a cross, while all others are marked with a hook.
Up-to-dateness: In RAG, new data can always be added and are directly available. Pre-trained and
ifne-tuned models can only retrieve data given at training stage - or when these are submitted in
the prompt (limited by the context length). Therefore, a pre-trained model is limited to its first
training, while the fine-tuning process can be restarted with new data containing the missing
information. Therefore, pre-trained LMs are marked with a cross, while all others are marked
with a hook.</p>
        <p>Tech constraints: LLMs are generalists. Thus, they can access a wealth of knowledge of various
technologies by design. SLMs and TLMs are smaller, therefore limited in their knowledge. With
ifne-tuning and RAG all LM classes can be enhanced in their technology knowledge. Therefore,
pre-trained SLMs and TLMs are marked with a cross, while all others are marked with a hook.
Trustworthiness: LMs can achieve a good performance on diferent tasks, but they remain neural
networks and thus statistical models - sufering from false predictions or hallucinations [ 36].
Even more, neural networks are seen as black boxes, since it is not clear how the model gets to
its result. If trustworthiness is needed, the LM should only be used in a setting where the output
can be controlled. Therefore, we mark LLMs, SLMs and TLMs with a cross.</p>
        <p>Continuous usage frequency: The usage frequency directly afects the costs of the usage of a model,
while an LM is in general slow and expensive to operate, this model might not be suitable in a
setting where a continuous frequency is needed. On the other hand, smaller models can be used
more frequently, due to their reduced hardware consumption. Therefore, only GPU-deployed
SLM and TLM are marked with a hook, while all others are marked with a cross.</p>
        <p>Large input size: In general: all classes of LMs could be built to handle large input sizes. On the other
hand, a large input directly afects the needed resources and inference time. Consequently, since
TLMs are used in low resource and fast inference settings, they are less suitable for large input
sizes. Therefore, TLMs are marked with a cross, while all others are marked with a hook.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Comparison of Characteristics and LM Profiles</title>
      <p>We now compare the characteristics profile of each cybersecurity task with the profile of each LM. We
compute a score  ranging from 0 to 8, where 0 equals to "the model is not suitable for this task" and 8
equals to "the model is suitable for this task" using the formula</p>
      <p>8
( ) = ∑︁  (,  ).</p>
      <p>=1
Here,  represents one out of eight task characteristic as presented in Section 3.2 and  represents
an LM profile based on LM class, hardware and usage (if applicable) as presented in Section 4.3. We
calculate  as follows:
 (,  ) =
{︃0 if  required and  not fulfilled by</p>
      <p>1 otherwise</p>
      <p>E.g., since the characteristic personal data is required by the task secure coding, we calculate
 (  , (, )) = 0 because according to Table 3 the characteristic personal
data is not supported by a cloud-deployed SLM. Following this approach, the resulting scores for
conformity of LMs and tasks are summarized in Table 4.</p>
      <p>Note that these results give only an indication on which combination might be promising and which
not. Since the characteristics of LMs and tasks are generalized, a diferent result might be obtained
if a concrete LM is investigated and compared with respect to a specific task. Note furthermore, that
there might be some characteristics which are show-stoppers, e.g., if data protection is an unavoidable
restriction, an LM that does not fulfill this property must be excluded directly.</p>
      <p>From Table 4, we can obtain some key findings:
• TLMs on GPU and fine-tuned or RAG seem to be promising for all tasks.
• Cloud models are less favorable due to their restrictions on data protection and real-time usage.
• For some cases, the choice of models seems to be less important, i.e., threat modeling &amp; analysis
and incident response.</p>
      <p>Overall, depending on the task, we can observe that SLMs and TLMs are promising approaches if they
reach the required accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>This paper highlights that larger LMs are not always the optimal choice. Through a task-driven
assessment approach, we demonstrate how SLMs and TLMs can provide adequate solutions for certain
practices in the fields of secure software development and digital forensics. By analyzing representative
cybersecurity tasks, we identify task characteristics and check their fulfillment by LMs. Our technical
comparison shows where smaller LMs are more preferable than larger ones, ofering resource-aware
deployment without sacrificing performance.</p>
      <p>This work lays the foundation for strategically developing LMs for cybersecurity tasks. In the future,
we plan to implement and study especially SLMs and TLMs for distinct cybersecurity tasks selected
based on our approach. Moreover, we intend to explore additional LM properties and expand the range
of cybersecurity tasks to provide a more comprehensive understanding of the applicability of LMs.
Furthermore, we plan to investigate specific LMs in more detail and evaluate their characteristics in the
context of selected tasks - including, in particular, an assessment of their accuracy for each task.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
Computer Security Applications Conference, ACSAC, ACM, New York, NY, USA, 2023, pp. 366–378.
doi:10.1145/3627106.3627196.
[18] M. L. Siddiq, J. C. S. Santos, S. Devareddy, A. Muller, SALLM: Security assessment of generated
code, in: Proceedings of the 39th International Conference on Automated Software Engineering
Workshops, ASEW, Sacramento, CA, USA, 2024. doi:10.1145/3691621.3694934.
[19] C. Tony, M. Mutas, N. E. D. Ferreyra, R. Scandariato, LLMSecEval: A dataset of
natural language prompts for security evaluations (2023). doi:10.48550/arxiv.2303.09384.
arXiv:2303.09384.
[20] I. Tarandach, M. J. Coles, Threat Modeling - A Practical Guide for Development Teams, O’Reilly,
2021.
[21] A. Shostack, Threat Modeling: Designing for Security, Wiley, 2014.
[22] T. UcedaVélez, M. Morana, Risk Centric Threat Modeling: Process for Attack Simulation and</p>
      <p>Threat Analysis, Wiley, 2015.
[23] J. H. Saltzer, M. D. Schroeder, The protection of information in computer systems, Proceedings of
the IEEE 63 (1975) 1278–1308.
[24] A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code review, in:
Proceedings of the 35th International Conference on Software Engineering, ICSE, 2013, pp. 712–721.
doi:10.1109/ICSE.2013.6606617.
[25] L. Braz, C. Aeberhard, G. Çalikli, A. Bacchelli, Less is more: supporting developers in vulnerability
detection during code review, in: Proceedings of the 44th International Conference on Software
Engineering, ICSE, ACM, New York, NY, USA, 2022, pp. 1317–1329. doi:10.1145/3510003.
3511560.
[26] B. Chess, J. West, Secure Programming with Static Analysis, O’Reilly, 2007.
[27] M. Hickey, J. Arcuri, Hands on Hacking: Become an Expert at Next Gen Penetration Testing and</p>
      <p>Purple Teaming, Wiley, 2020.
[28] D. Schlette, M. Caselli, G. Pernul, A comparative study on cyber threat intelligence: The security
incident response perspective, IEEE Communications Surveys &amp; Tutorials 23 (2021) 2525–2556.
[29] B. Schneier, The future of incident response, IEEE Security &amp; Privacy 12 (2014) 96–96. doi:10.</p>
      <p>1109/MSP.2014.102.
[30] M. Thottan, C. Ji, Anomaly detection in IP networks, IEEE Transactions on Signal Processing 51
(2003) 2191–2204.
[31] D. Spiekermann, Positional packet capture for anomaly detection in multitenant virtual networks,</p>
      <p>International Journal of Network Management 35 (2025) e2326.
[32] R. P. Ilyas Moutawwakil, Optimum-benchmark: A framework for benchmarking the performance
of transformers models with diferent hardwares, backends and optimizations., ????
[33] R. P. Ilyas Moutawwakil, Llm-perf leaderboard, https://huggingface.co/spaces/optimum/
llm-perf-leaderboard, 2023.
[34] C. Fourrier, N. Habib, A. Lozovskaya, K. Szafer, T. Wolf, Open llm leaderboard v2, https:
//huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2024.
[35] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t.</p>
      <p>Yih, T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive NLP tasks,
Advances in neural information processing systems 33 (2020) 9459–9474.
[36] S. Banerjee, A. Agarwal, S. Singla, LLMs will always hallucinate, and we need to live with this
(2024). doi:10.48550/arXiv.2409.05746. arXiv:2409.05746.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Naveed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. U.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saqib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mian</surname>
          </string-name>
          ,
          <article-title>A comprehensive overview of large language models (</article-title>
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/arxiv.2307.06435. arXiv:
          <volume>2307</volume>
          .
          <fpage>06435</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Basic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giaretta</surname>
          </string-name>
          ,
          <article-title>Large language models and code security: A systematic literature review (</article-title>
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .48550/arxiv.2412.15004. arXiv:
          <volume>2412</volume>
          .
          <fpage>15004</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wickramasekara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Breitinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Scanlon, Exploring the potential of large language models for improving digital forensic investigation eficiency</article-title>
          ,
          <source>Forensic Science International: Digital Investigation</source>
          <volume>52</volume>
          (
          <year>2025</year>
          )
          <article-title>301859</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.fsidi.
          <year>2024</year>
          .
          <volume>301859</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Amini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Security and privacy challenges of large language models: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>57</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>It's not just size that matters: Small language models are also fewshot learners</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <article-title>Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2339</fpage>
          -
          <lpage>2352</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>185</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Lamaakal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Maleh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>El Makkaoui</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ouahbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pławiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Alfarraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almousa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Abd</surname>
          </string-name>
          El-Latif,
          <article-title>Tiny language models for automation and control: Overview, potential applications</article-title>
          , and future research directions,
          <source>Sensors</source>
          <volume>25</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .3390/s25051318.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          , et al.,
          <article-title>A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with LLMs, and trustworthiness (</article-title>
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ arXiv.2411.03350.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Irugalbandara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Daynauth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Arachchige</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dantanarayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Flautner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mars</surname>
          </string-name>
          ,
          <article-title>Scaling down to scale up: A cost-benefit analysis of replacing OpenAI's LLM with open source SLMs in production</article-title>
          ,
          <source>in: Proceedings of the International Symposium on Performance Analysis of Systems and Software</source>
          ,
          <string-name>
            <surname>ISPASS</surname>
          </string-name>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>280</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Enhancing large language models for secure code generation: A dataset-driven study on vulnerability mitigation (</article-title>
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/ arxiv.2310.16263. arXiv:
          <volume>2310</volume>
          .
          <fpage>16263</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sangalay</surname>
          </string-name>
          , C. Cheng, Y. Tian,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Fine tuning large language model for secure code generation</article-title>
          ,
          <source>in: Proceedings of the First International Conference on AI Foundation Models and Software Engineering</source>
          ,
          <string-name>
            <surname>FORGE</surname>
          </string-name>
          , ACM, New York, NY, USA,
          <year>2024</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>90</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3650105.3652299.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shadid</surname>
          </string-name>
          , E. Al-Shaer,
          <article-title>SecureBERT: A domain-specific language model for cybersecurity</article-title>
          ,
          <source>in: Proceedings of the International Conference on Security and Privacy in Communication Systems, SecureComm</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. Z.</given-names>
            <surname>Ridoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S. H.</given-names>
            <surname>Shaon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cuzzocrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Enstack:</surname>
          </string-name>
          <article-title>An ensemble stacking framework of large language models for enhanced vulnerability detection in source code</article-title>
          , in: International Conference on Big Data, BigData, IEEE,
          <year>2024</year>
          , pp.
          <fpage>6356</fpage>
          -
          <lpage>6364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sultan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Karim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S. H.</given-names>
            <surname>Shaon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wardat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akter</surname>
          </string-name>
          ,
          <article-title>Enhanced LLM-based framework for predicting null pointer dereference in source code (</article-title>
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/arxiv.2412.00216. arXiv:
          <volume>2412</volume>
          .
          <fpage>00216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Scanlon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Breitinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hargreaves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-N.</given-names>
            <surname>Hilgert</surname>
          </string-name>
          , J. Sheppard,
          <article-title>ChatGPT for digital forensic investigation: The good, the bad, and the unknown</article-title>
          ,
          <source>Forensic Science International: Digital Investigation</source>
          <volume>46</volume>
          (
          <year>2023</year>
          )
          <article-title>301609</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.fsidi.
          <year>2023</year>
          .
          <volume>301609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Henseler</surname>
          </string-name>
          , H. van Beek,
          <article-title>ChatGPT as a copilot for investigating digital evidence</article-title>
          ,
          <source>in: Proceedings of the Third International Workshop on Artificial Intelligence</source>
          and
          <article-title>Intelligent Assistance for Legal Professionals in the Digital Workplace (LegalAIIA</article-title>
          <year>2023</year>
          ), volume
          <volume>3423</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ghawaly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McCleary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Webb</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Baggili</surname>
          </string-name>
          ,
          <article-title>ForensicLLM: A local large language model for digital forensics</article-title>
          ,
          <source>in: Proceedings of the Digital Forensics Research Conference Europe, DFRWS EU</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arunasalam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. B.</given-names>
            <surname>Celik</surname>
          </string-name>
          ,
          <article-title>Can large language models provide security &amp; privacy advice? measuring the ability of llms to refute misconceptions</article-title>
          ,
          <source>in: Proceedings of the 39th Annual</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>