<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>February</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Stackelberg Approach to Federated Learning for Malware Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Augello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra De Paola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marena Jestin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Lo Re</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Engineering, University of Palermo</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <issue>1</issue>
      <fpage>3</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>The widespread use of smart devices requires efective malware detection tools to ensure user security and privacy. The dynamic nature of the software ecosystem, characterized by data distribution changes, poses significant challenges to the long term sustainability of machine learning models for malware detection, requiring periodic updates to maintain their efectiveness. Additionally, collecting up-to-date information for training machine learning models in a centralized fashion is costly, time-consuming, and privacy-invasive. To address these shortcomings, this work proposes a Stackelberg game model to incentivize users to contribute to the training of a malware detection model through Federated Learning. The proposed model takes into account heterogeneous capabilities of the participants, allowing them to tune their contribution based on the quality and quantity of the data they can provide. Experimental results demonstrate that the proposed approach can ensure the efectiveness of the detection model over multiple years.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Malware detection</kwd>
        <kwd>Federated learning</kwd>
        <kwd>Game theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation and Related Works</title>
      <p>Smart devices are becoming increasingly integrated into our daily lives. As technology evolves, the scale
and complexity of cyberattacks has made traditional security measures inadequate, necessitating the
development of innovative strategies to efectively address emerging vulnerabilities [ 1]. Traditional
malware detection techniques, such as signature-based and heuristic approaches, have become inadequate
in addressing the complexities of modern cyber threats. These methods rely on predefined patterns or
static code analysis, making them vulnerable to sophisticated attacks [2]. Specifically, polymorphic and
metamorphic malware can alter their code or behavior to elude detection by such systems [3]. As a
result, these conventional methods often produce high false-negative rates, especially when faced with
new or dynamic forms of malware that have not been previously encountered [4].</p>
      <p>Over the past decade machine learning algorithms have become the preferred threat detection
approach [5], enabling systems to identify patterns and anomalies in data more eficiently than traditional
methods [6]. By processing vast amounts of data, these systems can detect sophisticated attack patterns
faster and more accurately [7]. Moreover, machine learning models are easier to adapt to emerging
threats [8]. Despite their success, machine learning models frequently face a decline in performance
after their deployment. This performance degradation is due to a phenomenon known as concept
drift [9]. Concept drift arises when the statistical characteristics of the input data change over time,
leading to discrepancies between the data the model was trained on and the data it encounters. Concept
drift is a particularly serious issue in the cybersecurity domain, where the threat landscape is constantly
evolving, and new malware variants are continually being developed [10]. This change can lead to
a decrease in the model’s accuracy, making it inefective in detecting new threats [ 11]. Hence, it is
essential to update the model periodically to ensure it remains efective in detecting new threats [ 12, 13].</p>
      <p>Furthermore, most existing malware detection systems that use machine learning rely on a centralized
collection and annotation of data [14]. The process of labelling data is both expensive and time-intensive,
creating a bottleneck in the development pipeline [15]. To reduce such overheads, crowdsourcing has
been proposed as a viable solution for malware detection [16]. Users can collect and annotate data,
which is then sent to a central server that elaborates the data to train the model. However, despite its
potential, crowdsourcing introduces significant challenges regarding user privacy. Users may hesitate to
share details with a central server about the applications installed on their devices since even a simple
snapshot of installed apps can reveal sensitive personal information [17].</p>
      <p>Decentralized alternatives, like Federated Learning, ofer a solution by enabling collaborative model
training without aggregating sensitive data on a central server [18]. In Federated Learning, each client
retains its local training dataset, which is never shared with the server. Instead, clients compute updates
to the global model maintained by the server, communicating only these updates rather than the raw
data [19]. The efectiveness of such a learning model depends on the quality of updates provided by
users, although it is not a given that users perceive the value of actively contributing to the functioning
of the system [20]. By aligning clients’ interests with the broader goal of collaborative model training,
incentive mechanisms can help overcome the challenges of client engagement in Federated Learning
systems [21].</p>
      <p>Various studies have employed game theory to design incentive mechanisms that encourage clients
to allocate their computational resources for Federated Learning [22]. For instance, Donahue et al. [23]
analyzed the conditions under which rational client would be willing to participate in Federated Learning
over multiple interactions with the server. While most works focus on ensuring that clients ofer their
computational capabilities to train a shared model [24, 25], this work leverages game theory to address
the specific challenges of obtaining up-to-date training data to overcome concept drift in malware
detection models. By incorporating a Stackelberg-based approach to designing an incentive mechanisms
for federated learning, this work aims to ensure sustained client participation while adapting to concept
drift. This approach enhances model robustness and accuracy in dynamic environments by aligning
client incentives with the learning objectives of the system.</p>
      <p>The main contributions of this work are:
• A Stackelberg game model for Federated Learning (FL) to prevent model ageing due to concept
drift in malware detection.
• A novel game formulation that allows clients to tune their contribution based on the quantity of
data they can provide and the efort associated with labelling it. Allowing clients to compensate
for the potentially poor quality of labels they can supply with quantity ensures that even clients
with limited labelling expertise can participate in the learning process.
• A novel strategy adaptation mechanism that allows the server to tune the reward in order to
encourage client participation even in high-cost scenarios.
• The formal proof that proposed game model converges to a unique Nash equilibrium, ensuring
system stability over time.
• Extensive experimental evaluations showing that, through the proposed incentive mechanism, it
is possible to engage clients in the FL process for multiple years, maintaining the efectiveness of
the model.</p>
      <p>The remainder of this work is structured as follows: Section 2 describes the system model. Section 3
presents the optimal client strategy. Section 4 discusses the server reward policy. Section 5 provides the
experimental evaluation. Finally, Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System model</title>
      <p>The system consists of a single central server and a set of  clients. The clients are responsible for
training the local models using their own data and sending the local models to the server. The server is
responsible for collecting the local models from the clients, aggregating them, and sending the global
model back to the clients for further training according to the Federated Learning (FL) paradigm [26].
To incentivize the clients to collect high-quality data for training, the server ofers them a monetary
reward based on their contribution.</p>
      <p>The interaction between the server and clients is modeled as a Stackelberg game, a game theory
model useful in situations of sequential competition between two or more players, in which one player,
named leader, moves first and the other players, named followers, respond to the first player’s move [ 27].
The server acts as the leader and, in its opening move, announces a monetary reward for cooperative
clients in terms of high-quality data to train their local model. The clients act as followers and respond
by determining a suitable contribution to maximize their utility. The clients are fully rational agents, and
engage in a non-cooperative subgame with the other clients to determine their optimal contribution.</p>
      <p>The server and clients interact periodically, when the model needs to be updated on new data. For
each of these updates, the server sets a reward and the clients decide how much data to contribute. The
reward is given based on the quality of the data provided by the clients.</p>
      <p>In the proposed model, the contribution  of a client, referred to as their quality index, is linearly
dependent on the size of the collected dataset || and the quality of the labelling  , as shown in Eq. 1:
This proposal is consistent with existing literature [28] showing that the generalization error introduced
by label noise can be ofset by increasing the dataset size. Experimental observations show that the
accuracy of the global model is logarithmically dependent on the quality index, as shown in Fig. 1,
which is in line with the results of previous studies [25]. Initially, increasing the quality index leads to
significant improvements in the accuracy of the model, but these improvements become progressively
less significant as the quality index increases.</p>
      <p>
        = ||(2 − 1).
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
0.92
0.90
cy0.88
a
r
cu0.86
c
A0.84
0.82
0
100
200
300
      </p>
      <p>400</p>
    </sec>
    <sec id="sec-3">
      <title>3. Optimal client strategy</title>
      <p>Each client contributes to the learning process by acquiring and labeling data, and receives a reward
based on the quality and quantity of the data provided. The clients aim to maximize their utility, defined
as the diference between the reward obtained and the cost incurred. In their decision-making process,
the clients can leverage knowledge from previous interactions.</p>
      <sec id="sec-3-1">
        <title>3.1. Client policy</title>
        <p>Client contribution cost model: The cost that client  incurs is given by the cost of acquiring data
and the cost of labeling them. The cost of acquiring data scales linearly with the number or samples
according to a parameter  . Since not all samples are equally dificult to label, the cost of labeling
an entire dataset is not linear to the target accuracy level [30]. The cost of labeling each sample is
modeled as proportional to its dificulty. Assuming a maximum entropy distribution of the dificulty of
the samples bounded in the [,  ] range (i.e., an uniform distribution), the cost of labeling a fraction
 of the samples, starting from the easiest, is proportional to definite the integral of the cumulative
distribution function of the dificulty of the samples up to the  -th quantile:
∫︀(− )+ ∫︀ 1   = −2   2.</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
Data
Fitted Curve
500
        </p>
        <p>
          Since randomly assigning labels for a binary classification task yields 50% label accuracy, to ensure
that a fraction   of samples have the correct label, the client will have to provide labels for a fraction
of the samples equal to at least 2  − 1, incurring in a cost proportional to 2( − )(  − 1/2)2.
The resulting behavior can be interpreted as the client determining an upper bound on the dificulty
of the samples they are willing to analyze. In practice, this means that at least a 2  − 1 fraction of
the samples undergoes costly manual annotation, while the remaining samples are labeled with an
unreliable automatic procedure with negligible cost. The final cost of labeling a dataset of size || up
to a target accuracy of  is thus given by || (  − 1/2)2, with   being a proportionality constant
that depends on the maximum dificulty of the samples and the client’s labeling capabilities. Thus, the
overall participation cost for client  is given by Eq. (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ):
        </p>
        <p>(||,  ) = ||(  +  (  − 1/2)2).</p>
        <p>
          Since the clients aim to maximize their utility, they will strive to choose the dataset size and the
target accuracy that minimize the cost for any given target quality index  . By fixing  , Eq. (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) can be
used to express the cost only in terms of the dataset size ||:
(||)|  = ||  +    2 .
        </p>
        <p>
          4||
The cost function is minimized for || =  √︀ /(4 ), where the first derivative of  is equal to zero
and the second derivative is positive. The corresponding fraction of accurately labeled samples can
be obtained by substituting the optimal || in Eq. (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and solving for   given a target quality index,
resulting in   = 21 + √ 1/  . In the following, it is assumed that 4  ≤   so that √ 1/  ≤ 21 , as
a labeling accuracy   above 100% is meaningless. In scenarios where this condition is not met, i.e.
labeling one sample is cheaper than collecting four, the label accuracy is set as the maximum possible
value   = 1. By substituting this value of   in Eq. (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) the dataset size to achieve a target quality index
  is || =  . Fig. 2 shows the cost of a unitary increase in the quality index as the labeling cost
increases compared to the collection cost. If all clients were required to achieve a quality index of 1, the
cost would increase linearly. Instead, through the proposed dynamic tuning of the quality parameters,
the same quality index can be achieved with a sublinear dependency on the labeling cost.
0.00012
0.00010
c( ) 0.00008
0.00006
0.00004
0.00002
dynamic
= 1
0
20
40
60
80
        </p>
        <p>
          100
/
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(4)
(5)
        </p>
        <p>Optimal client contribution: Given the contribution   of a client, its utility is defined as the
diference between the reward obtained and the cost incurred, as shown in Eq. (5), with  being the
total reward given by the server and  = ∑︀</p>
        <p>=1   the total contribution of all clients:
( ) =</p>
        <p>− ( ).</p>
        <p>To find the optimal quality index of the i-th client, the derivative of the utility function with respect
to   is set as equal to zero. Since  can be rewritten as  =   +  − , where  −  is the sum of the
contributions of clients other than , i.e.,  −  = ∑︀̸=   , it is possible to isolate the contribution of
√   = 0, satisfied by
  =  − 
︁( √︁</p>
        <p>− √   − 1
︁)
if
√

  
&gt;  − . (6)
Intuitively, the condition in Eq. (6) means that the -th client should only participate if the current
expected reward per unit of contribution is greater than its unitary contribution cost, otherwise the
expected utility would be negative. This is illustrated in Fig. 3, where the cost for a client to achieve
a unitary quality index is plotted against the average reward per unit of contribution from the other
clients across multiple client interactions. For all the interactions where the unit reward is lower than
the unit cost, the client abstains from participating.
individual clients:
2.6
t
so2.5
C
ld2.4
o
sh2.3
e
r
Th2.2
2.1
1e 5
0
5
10
20</p>
        <p>25
15</p>
        <sec id="sec-3-1-1">
          <title>Interaction</title>
          <p>since the clients are not cooperating, the -th client does not have direct access to this information and
must estimate it. When the system converges, each client can estimate the total contribution of the
other clients by observing the reward received compared to the total reward in the previous interaction.</p>
          <p>For the initial estimate, the client has no information about the other clients, so it assumes that all
clients are homogeneous and adopt the same policy to determine their quality index. In this case, in
each client’s formulation,  −  = ( −
maximizing  will be given by Eq. (7):
 
1)  and ( ) = (− 1)  −
( ), hence and the quality index</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Nash Equilibrium</title>
        <p>=
 2√</p>
        <p>Theorem 1. At least one Nash Equilibrium exists for the client’s utility function in Eq. (5).
Proof. For a Nash Equilibrium to exist, the utility function of the client must be concave with respect to
the client’s strategy, i.e., the choice of quality index   [31]. The first and second derivative of the utility
function with respect to   are computed to prove the concavity:
 =</p>
        <p>− 
(  +  − )2 −
√︀  
and
22 = − (  +  − )3
2 − 
.</p>
        <p>Since second derivative of ( ) with respect to  , given by Eq. (8), is negative for all the admissible
(positive) values of  , the utility function is concave.</p>
        <p>Threshold Cost
Unitary contribution cost
(7)
(8)
1. Positivity: ( ) &gt; 0.
2. Monotonicity:  ≥
3. Scalability: ∀ &gt;
1,  ( )</p>
        <p>≥
contribution performed by all the other clients, the -th client also increases its contribution.
 ′→−
( ) ≥</p>
        <p>( ′) ; this means that, in reaction to an increase in
(  ), when all clients uniformly scale up their contribution, the
-th client’s response function will increase less than linearly.</p>
        <p>Lemma 1.</p>
        <p>√︁ 
√
   − 2√︀ −  &gt; 0 always holds if the following condition holds:

=1
∑︁ √︀    &gt; 2( − 1)√︀  .</p>
        <p>Proof. By setting the first derivative of the client’s utility function in Eq. (6) to zero, the following
relationship between the quality indexes of the clients can be obtained:</p>
        <p>In order to ensure that the clients will converge to the Nash equilibrium, there must be a unique
equilibrium point. Let ( ) be the response function representing the optimal quality index of client 
given the quality indexes of the other clients, following Eq. (6). If ( ) is a standard function then the
proposed game has a unique Nash equilibrium [32].</p>
        <p>Definition 1.</p>
        <p>A function ( ) is standard if, for all  ≥ 0, the following properties are satisfied:
(9)
(10)
(11)
(12)
(13)
︁)
(14)
 − 
( −  +  )2 =

1 √︀  .

∑︁
=1</p>
        <p>− 
( −  +   )2 = ∑︁ 1 √︀    .</p>
        <p>=1 
Summing up on both sides of Eq. (10) for all clients yields the relationship in Eq. (11):
Hence, the total contribution in terms of the costs of all the clients is rewritten as Eq. (12):
 =</p>
        <p>By substituting Eq. (12) into Eq. (10), it is possible to derive that
2( − 1)2√︀   =  −  ⎝</p>
        <p>∑︁ √︀    ⎠ .
⎛ 
=1
⎞2
√︀√  ( − 1) &gt; 2√︀ − ( − 1)√   , which simplifies to
√︁
√
   − 2√︀ −  &gt; 0

If Eq. (9) holds, by computing the square root of both sides of the equation, it is possible to prove that
Theorem 2. The client response function is standard, and the proposed game has a unique Nash Equilibrium
if the condition in Eq. (9) holds.</p>
        <p>Proof. The positivity is obviously satisfied by the client response function under the condition in Eq. (6).
• The client response function is monotone for   &gt; 0 if Eq.(9) holds, i.e.,  ≥ →′− ( ) − ( ′) ≥ 0:
( ) − ( ′) = √︁ √ 
   (√︀ −  −
√︁ ′ )</p>
        <p>−  − ( −  −  −′ )
= (√︀ −  −
√︁ −′ )
︁( √︁ 
√
   − (√︀ −  +</p>
        <p>︁)
√︁ −′ ) ≥ (√︀ −  −
√︁ −′ )
︁( √︁ 
√
   − 2√︀ −  ,
which is positive according to Lemma 1.
• The client response function is scalable: ∀ &gt; 1,  ( ) − (  ) ≥ 0:
 ( ) − (  ) = 
√︃  − 
√
  
−  −  −</p>
        <p>√︃  −  +  −  = (√ − 1)
√
√︃  − 
√</p>
        <p>.
  
Since  &gt;
1, both terms are positive, thus  ( ) ≥
(  ′) and the response function is scalable.</p>
        <p>Hence the client response function ( ) is a standard function which, as demonstrated by Deligiannis
et al. [32], is a suficient condition for the existence of a unique Nash Equilibrium.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Server reward policy</title>
      <p>The server aims to maximize its utility by setting an appropriate reward. The utility, denoted as  ,
is defined as the diference between the satisfaction derived from the aggregated global model and
the total reward  distributed to the followers. The server’s satisfaction depends on the accuracy of
the new global model, which grows concavely as the total contribution of the followers increases. To
represent this dependency, a logarithmic function is employed to describe the relationship between the
model’s accuracy and the followers’ quality indexes:
The server can then find the optimal reward * by maximizing the utility function  (). Since the
utility function is concave (negative second derivative), the optimal reward can be found by calculating
the first derivative, shown in Eq. (19), and setting it equal to zero.</p>
      <p>=
 (− 1) √︁</p>
      <p>(− 1)
∑︀=1
√︂ 

− ,(− 1)
− (− 2)
2  (− 1) √︁ 
(− 1)
∑︀=1 √︁  − 
 − ,(− 2)
+ 2</p>
      <p>For the sake of conciseness, the following parameters are used:  =
contribution in the previous round, and  = ∑︀=1
√︂</p>
      <p>− ,(− 1)
 − (− 2)(− 1)</p>
      <p>(− 1) , which is the average
. Eq. (19) is thus rewritten as
√
the server to maximize its utility:
2 √+2 − 1, which has a unique positive solution represented in Eq. (20), chosen as a reward by
The Stackelberg equilibrium can be found by solving the above non-linear optimization problem. Let  *
be unique Nash equilibrium obtained by the clients when the server ofers a reward R. The server has
to maximize Eq. (16) a priori to find its unique optimal reward * and announce it to the clients. Given
a reward , each client sets their contribution  * according to Eq. (6). The server, however, does not
have direct access to the parameters of the clients’ cost function, i.e.,   and  . Instead, assuming that
the costs have not changed significantly, the server can estimate the unitary contribution cost of client
, that is √
and ( − 2) subscripts respectively, through Eq. (17).</p>
      <p>, from the behavior of the clients in the previous two rounds, denoted with the ( − 1)
√︀   =</p>
      <p>(− 1) − ,(− 2)
︀(  ,(− 1) +  − ,(− 2)
︀) 2</p>
      <p>Hence, the server can estimate the expected utility given a reward  solely by relying on the previous
interactions by substituting Eq. (6) into Eq. (16), with the cost estimated by Eq. (17):
︃(
 () =  log 1 +  ∑︁

=1</p>
      <p>)︃
 () − .
︃(


 () =  log 1 + 
 (− 1)

√︃

∑︁
 √︃  − ,(− 1)</p>
      <p>)︃
(− 1) =1
 − (− 2)
− .</p>
      <p>(15)
(16)
(17)
(18)
(19)
* =</p>
      <p>+

2</p>
      <p>√︁
(− 1) +
(− 1) ︀( 2 222 + (− 1)</p>
      <p>︀)
.</p>
      <p>Since in the first interaction the server has no information about the clients, it assumes that all clients
are homogeneous and utilizes an estimate of the average costs instead. The utility is thus formulated
(20)
− , which is concave with respect to  and can easily be
︂(
as  () =  log 1 +  ∑︀
maximized analytically.</p>
      <p>=1 2√
(− 1)
̃︁</p>
      <p>︂)</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental evaluation</title>
      <p>To assess the proposed approach, a multi-layer perceptron was considered as model to be learned.
The data sampled by clients are derived from the KronoDroid dataset [29], a publicly available
realworld dataset covering the period from 2008 to 2020. The KronoDroid dataset, consisting of 28,745
Android malware samples and 35,256 benign samples, sorted according to their “last modification” date,
provides a rich and dynamic environment for assessing the system’s capabilities in maintaining model
performances over multiple years. The dataset was divided into 26 non-overlapping subsets covering
three months each, from 2012 to 2018. After each three-month period, the server announces a reward
to the ten clients, who then decide their contribution efort acquiring samples from the current subset
and labeling them. The model accuracy and the server’s utility are computed according to the total
client contribution.</p>
      <p>0.82
and provide fair and consistent rewards, the mathematical model was validated by comparing the
theoretical utility expected by the server with the actual utility (Fig. 4), and the reward obtained by the
clients with their provided quality index (Fig. 5). As a general trend, clients with lower data acquisition
costs achieve higher quality indexes, enabling them to obtain higher rewards. Additionally, if all clients
have high costs, the server will obtain lower utility. The linear relationship in these plots confirms that
the server estimates are consistent with the actual outcomes, and that the clients receive consistent
and fair rewards for their contributions. The diference in reward outcome for heterogeneous clients is
further highlighted in Fig. 6, where it is possible to observe that lower data acquisition costs lead clients
to provide higher quality indexes while incurring in lower costs, thus achieving a reward inversely
proportional to their unitary cost.</p>
      <sec id="sec-5-1">
        <title>Average client utility</title>
        <p>Efectiveness over time: The model’s accuracy and the server’s utility were evaluated over multiple
interactions to assess the system’s ability to maintain the model’s performance over time. Each client’s
contribution converges over the interactions to a value that depends on their data acquisition and
labeling costs, as shown in Fig. 7. As can be seen in Fig. 8, the model’s accuracy remains stable over
time, despite the concept drift in the dataset, thanks to the clients’ contributions. If acquiring labeled
data is too expensive the performance of the model will settle on a lower accuracy (∼ 86%), while if
the clients can provide high-quality data at a lower cost, the model will achieve a higher accuracy
(∼ 92%). These values are consistent with state-of-the-art results with this dataset when concept drift is
efectively managed [ 33], thus showing the feasibility of using an incentive mechanism for performing
data collection through crowdsourcing to maintain the efectiveness of malware detection models over
time.
1e 5</p>
        <p>t
2.2 s
o
c
2.1 n
o
i
2.0 t
u
b
i
1.9 tr</p>
        <p>n
1.8 c
o
y
1.7 r
a
t
1.6 in</p>
        <p>U</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>It has been widely demonstrated that machine learning models are efective at detecting malware,
but lose efectiveness over time due to concept drift. To maintain the efectiveness of the model, it is
necessary to continuously update the model with new data. Distributed computing paradigms such as
Federated Learning can be adopted to directly involve users in the data collection process required to
maintain the model up-to-date. However, without adequate incentives, the clients may not be willing to
provide high-quality data, leading to model degradation. This work proposes a Stackelberg game model
for Federated Learning to incentivize the clients to collect high-quality data, enabling the long-term
feasibility of employing machine learning models for malware detection despite the concept drift issue.
Clients can decide how much data to collect and how much efort to put into labeling it to maximize their
net utility, while the server decides the overall reward to maximize the contribution of the clients. The
Stackelberg game was shown to possess a stable Nash equilibrium, and the model was experimentally
shown to be capable of preserving the efectiveness of the model over multiple years. In the current
version of the system, the server relies on honest clients to provide accurate quality indexes. Future
work will investigate strategies that allow the server to infer the quality indexes of the clients, and
meta-learning strategies to extract more value from poorly-annotated data.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was partially supported by the AMELIS project, within the project FAIR (PE0000013), and by
the ADELE project, within the project SERICS (PE00000014), both under the MUR National Recovery
and Resilience Plan funded by the European Union - NextGenerationEU.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[4] J. Singh, J. Singh, A survey on machine learning-based malware detection in executable files,</p>
      <p>Journal of Systems Architecture 112 (2021) 101861.
[5] A. H. Salem, S. M. Azzam, O. Emam, A. A. Abohany, Advancing cybersecurity: a comprehensive
review of ai-driven detection techniques, Journal of Big Data 11 (2024) 105.
[6] D. Gibert, C. Mateu, J. Planes, The rise of machine learning for detection and classification of
malware: Research developments, trends and challenges, Journal of Network and Computer
Applications 153 (2020) 102526.
[7] M. Ahsan, K. E. Nygard, R. Gomes, M. M. Chowdhury, N. Rifat, J. F. Connolly, Cybersecurity threats
and their mitigation approaches using machine learning—a review, Journal of Cybersecurity and
Privacy 2 (2022) 527–555.
[8] A. B. Nassif, M. A. Talib, Q. Nasir, F. M. Dakalbab, Machine learning for anomaly detection: A
systematic review, IEEE Access 9 (2021) 78658–78700.
[9] V. Agate, A. De Paola, S. Drago, P. Ferraro, G. L. Re, Enhancing iot network security with
concept drift-aware unsupervised threat detection, in: 2024 IEEE Symposium on Computers and
Communications (ISCC), IEEE, 2024, pp. 1–6.
[10] F. Barbero, F. Pendlebury, F. Pierazzi, L. Cavallaro, Transcending transcend: Revisiting malware
classification in the presence of concept drift, in: 2022 IEEE Symposium on Security and Privacy
(SP), IEEE, 2022, pp. 805–823.
[11] D. W. Fernando, N. Komninos, Fesad ransomware detection framework with machine learning
using adaption to concept drift, Computers &amp; Security 137 (2024) 103629.
[12] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro, {TESSERACT}: Eliminating
experimental bias in malware classification across space and time, in: 28th USENIX security
symposium (USENIX Security 19), 2019, pp. 729–746.
[13] A. Augello, A. De Paola, G. Lo Re, Hybrid multilevel detection of mobile devices malware
under concept drift, Journal of Network and Systems Management 33 (2025) 36. doi:10.1007/
s10922-025-09906-3.
[14] X. Deng, Z. Wang, X. Pei, K. Xue, Transmalde: An efective transformer based hierarchical
framework for iot malware detection, IEEE Transactions on Network Science and Engineering
(2023).
[15] V. Rey, P. M. S. Sánchez, A. H. Celdrán, G. Bovet, Federated learning for malware detection in iot
devices, Computer Networks 204 (2022) 108693.
[16] A. Augello, A. De Paola, G. Lo Re, M2FD: Mobile malware federated detection under concept drift,</p>
      <p>Computers &amp; Security (2025) 104361. doi:10.1016/j.cose.2025.104361.
[17] S. Seneviratne, A. Seneviratne, P. Mohapatra, A. Mahanti, Predicting user traits from a snapshot
of apps installed on a smartphone, ACM SIGMOBILE Mobile Computing and Communications
Review 18 (2014) 1–8.
[18] Y. Tong, Y. Wang, D. Shi, Federated learning in the lens of crowdsourcing., IEEE Data Eng. Bull.</p>
      <p>43 (2020) 26–36.
[19] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y. Arcas, Communication-Eficient Learning
of Deep Networks from Decentralized Data, in: A. Singh, J. Zhu (Eds.), Proceedings of the 20th
International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of
Machine Learning Research, PMLR, 2017, pp. 1273–1282. URL: https://proceedings.mlr.press/v54/
mcmahan17a.html.
[20] A. Augello, A. Gupta, G. Lo Re, S. K. Das, Tackling selfish clients in federated learning, in:
27th European Conference on Artificial Intelligence (ECAI 2024), IOS Press, 2024, pp. 1888–1895.
doi:10.3233/FAIA240702.
[21] Y. Zhan, J. Zhang, Z. Hong, L. Wu, P. Li, S. Guo, A survey of incentive mechanism design for
federated learning, IEEE Transactions on Emerging Topics in Computing 10 (2021) 1035–1044.
[22] R. Zeng, C. Zeng, X. Wang, B. Li, X. Chu, Incentive mechanisms in federated learning and a
game-theoretical approach, IEEE Network 36 (2022) 229–235.
[23] K. Donahue, J. Kleinberg, Optimality and stability in federated learning: A game-theoretic
approach, Advances in Neural Information Processing Systems 34 (2021) 1287–1298.
[24] S. Jiang, J. Wu, A reward response game in the federated learning system, in: 2021 IEEE 18th</p>
      <p>International Conference on Mobile Ad Hoc and Smart Systems (MASS), IEEE, 2021, pp. 127–135.
[25] Y. Zhan, P. Li, Z. Qu, D. Zeng, S. Guo, A learning-based incentive mechanism for federated learning,</p>
      <p>IEEE Internet of Things Journal 7 (2020) 6360–6368.
[26] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-eficient learning
of deep networks from decentralized data, in: Artificial intelligence and statistics, PMLR, 2017, pp.
1273–1282.
[27] T. Roughgarden, Stackelberg scheduling strategies, in: Proceedings of the thirty-third annual</p>
      <p>ACM symposium on Theory of computing, 2001, pp. 104–113.
[28] R. Gao, M. Saar-Tsechansky, Cost-accuracy aware adaptive labeling for active learning, in:</p>
      <p>Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 2569–2576.
[29] A. Guerra-Manzanares, H. Bahsi, S. Nõmm, Kronodroid: time-based hybrid-featured dataset for
efective android malware detection and characterization, Computers &amp; Security 110 (2021) 102399.
[30] S. Vijayanarasimhan, K. Grauman, What’s it going to cost you?: Predicting efort vs.
informativeness for multi-label image annotations, in: 2009 IEEE conference on computer vision and pattern
recognition, IEEE, 2009, pp. 2262–2269.
[31] J. B. Rosen, Existence and uniqueness of equilibrium points for concave n-person games,
Econometrica: Journal of the Econometric Society (1965) 520–534.
[32] A. Deligiannis, A. Panoui, S. Lambotharan, J. A. Chambers, Game-theoretic power allocation and
the nash equilibrium analysis for a multistatic mimo radar network, IEEE Transactions on Signal
Processing 65 (2017) 6397–6408.
[33] A. Guerra-Manzanares, M. Luckner, H. Bahsi, Android malware concept drift using system calls:
detection, characterization and challenges, Expert Systems with Applications 206 (2022) 117200.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Adat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Security in internet of things: issues, challenges, taxonomy, and architecture</article-title>
          ,
          <source>Telecommunication Systems</source>
          <volume>67</volume>
          (
          <year>2018</year>
          )
          <fpage>423</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Moser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kruegel</surname>
          </string-name>
          , E. Kirda,
          <article-title>Limits of static analysis for malware detection, in: Twenty-third annual computer security applications conference (ACSAC 2007)</article-title>
          , IEEE,
          <year>2007</year>
          , pp.
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Catalano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chezzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Angelelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tommasi</surname>
          </string-name>
          ,
          <article-title>Deceiving ai-based malware detection through polymorphic attacks</article-title>
          ,
          <source>Computers in Industry</source>
          <volume>143</volume>
          (
          <year>2022</year>
          )
          <fpage>103751</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>