1. Motivation and Related Works

February

A Stackelberg Approach to Federated Learning for Malware Detection

Andrea Augello

Alessandra De Paola

Marena Jestin

Giuseppe Lo Re

0 0 Department of Engineering, University of Palermo , Italy

2025

0 1 3 8

The widespread use of smart devices requires efective malware detection tools to ensure user security and privacy. The dynamic nature of the software ecosystem, characterized by data distribution changes, poses significant challenges to the long term sustainability of machine learning models for malware detection, requiring periodic updates to maintain their efectiveness. Additionally, collecting up-to-date information for training machine learning models in a centralized fashion is costly, time-consuming, and privacy-invasive. To address these shortcomings, this work proposes a Stackelberg game model to incentivize users to contribute to the training of a malware detection model through Federated Learning. The proposed model takes into account heterogeneous capabilities of the participants, allowing them to tune their contribution based on the quality and quantity of the data they can provide. Experimental results demonstrate that the proposed approach can ensure the efectiveness of the detection model over multiple years.

eol>Malware detection Federated learning Game theory

1. Motivation and Related Works

Smart devices are becoming increasingly integrated into our daily lives. As technology evolves, the scale and complexity of cyberattacks has made traditional security measures inadequate, necessitating the development of innovative strategies to efectively address emerging vulnerabilities [ 1]. Traditional malware detection techniques, such as signature-based and heuristic approaches, have become inadequate in addressing the complexities of modern cyber threats. These methods rely on predefined patterns or static code analysis, making them vulnerable to sophisticated attacks [2]. Specifically, polymorphic and metamorphic malware can alter their code or behavior to elude detection by such systems [3]. As a result, these conventional methods often produce high false-negative rates, especially when faced with new or dynamic forms of malware that have not been previously encountered [4].

Over the past decade machine learning algorithms have become the preferred threat detection approach [5], enabling systems to identify patterns and anomalies in data more eficiently than traditional methods [6]. By processing vast amounts of data, these systems can detect sophisticated attack patterns faster and more accurately [7]. Moreover, machine learning models are easier to adapt to emerging threats [8]. Despite their success, machine learning models frequently face a decline in performance after their deployment. This performance degradation is due to a phenomenon known as concept drift [9]. Concept drift arises when the statistical characteristics of the input data change over time, leading to discrepancies between the data the model was trained on and the data it encounters. Concept drift is a particularly serious issue in the cybersecurity domain, where the threat landscape is constantly evolving, and new malware variants are continually being developed [10]. This change can lead to a decrease in the model’s accuracy, making it inefective in detecting new threats [ 11]. Hence, it is essential to update the model periodically to ensure it remains efective in detecting new threats [ 12, 13].

Furthermore, most existing malware detection systems that use machine learning rely on a centralized collection and annotation of data [14]. The process of labelling data is both expensive and time-intensive, creating a bottleneck in the development pipeline [15]. To reduce such overheads, crowdsourcing has been proposed as a viable solution for malware detection [16]. Users can collect and annotate data, which is then sent to a central server that elaborates the data to train the model. However, despite its potential, crowdsourcing introduces significant challenges regarding user privacy. Users may hesitate to share details with a central server about the applications installed on their devices since even a simple snapshot of installed apps can reveal sensitive personal information [17].

Decentralized alternatives, like Federated Learning, ofer a solution by enabling collaborative model training without aggregating sensitive data on a central server [18]. In Federated Learning, each client retains its local training dataset, which is never shared with the server. Instead, clients compute updates to the global model maintained by the server, communicating only these updates rather than the raw data [19]. The efectiveness of such a learning model depends on the quality of updates provided by users, although it is not a given that users perceive the value of actively contributing to the functioning of the system [20]. By aligning clients’ interests with the broader goal of collaborative model training, incentive mechanisms can help overcome the challenges of client engagement in Federated Learning systems [21].

Various studies have employed game theory to design incentive mechanisms that encourage clients to allocate their computational resources for Federated Learning [22]. For instance, Donahue et al. [23] analyzed the conditions under which rational client would be willing to participate in Federated Learning over multiple interactions with the server. While most works focus on ensuring that clients ofer their computational capabilities to train a shared model [24, 25], this work leverages game theory to address the specific challenges of obtaining up-to-date training data to overcome concept drift in malware detection models. By incorporating a Stackelberg-based approach to designing an incentive mechanisms for federated learning, this work aims to ensure sustained client participation while adapting to concept drift. This approach enhances model robustness and accuracy in dynamic environments by aligning client incentives with the learning objectives of the system.

The main contributions of this work are: • A Stackelberg game model for Federated Learning (FL) to prevent model ageing due to concept drift in malware detection. • A novel game formulation that allows clients to tune their contribution based on the quantity of data they can provide and the efort associated with labelling it. Allowing clients to compensate for the potentially poor quality of labels they can supply with quantity ensures that even clients with limited labelling expertise can participate in the learning process. • A novel strategy adaptation mechanism that allows the server to tune the reward in order to encourage client participation even in high-cost scenarios. • The formal proof that proposed game model converges to a unique Nash equilibrium, ensuring system stability over time. • Extensive experimental evaluations showing that, through the proposed incentive mechanism, it is possible to engage clients in the FL process for multiple years, maintaining the efectiveness of the model.

The remainder of this work is structured as follows: Section 2 describes the system model. Section 3 presents the optimal client strategy. Section 4 discusses the server reward policy. Section 5 provides the experimental evaluation. Finally, Section 6 concludes the paper.

2. System model

The system consists of a single central server and a set of clients. The clients are responsible for training the local models using their own data and sending the local models to the server. The server is responsible for collecting the local models from the clients, aggregating them, and sending the global model back to the clients for further training according to the Federated Learning (FL) paradigm [26]. To incentivize the clients to collect high-quality data for training, the server ofers them a monetary reward based on their contribution.

The interaction between the server and clients is modeled as a Stackelberg game, a game theory model useful in situations of sequential competition between two or more players, in which one player, named leader, moves first and the other players, named followers, respond to the first player’s move [ 27]. The server acts as the leader and, in its opening move, announces a monetary reward for cooperative clients in terms of high-quality data to train their local model. The clients act as followers and respond by determining a suitable contribution to maximize their utility. The clients are fully rational agents, and engage in a non-cooperative subgame with the other clients to determine their optimal contribution.

The server and clients interact periodically, when the model needs to be updated on new data. For each of these updates, the server sets a reward and the clients decide how much data to contribute. The reward is given based on the quality of the data provided by the clients.

In the proposed model, the contribution of a client, referred to as their quality index, is linearly dependent on the size of the collected dataset || and the quality of the labelling , as shown in Eq. 1: This proposal is consistent with existing literature [28] showing that the generalization error introduced by label noise can be ofset by increasing the dataset size. Experimental observations show that the accuracy of the global model is logarithmically dependent on the quality index, as shown in Fig. 1, which is in line with the results of previous studies [25]. Initially, increasing the quality index leads to significant improvements in the accuracy of the model, but these improvements become progressively less significant as the quality index increases.

= ||(2 − 1). ( 1 ) 0.92 0.90 cy0.88 a r cu0.86 c A0.84 0.82 0 100 200 300

400

3. Optimal client strategy

Each client contributes to the learning process by acquiring and labeling data, and receives a reward based on the quality and quantity of the data provided. The clients aim to maximize their utility, defined as the diference between the reward obtained and the cost incurred. In their decision-making process, the clients can leverage knowledge from previous interactions.

3.1. Client policy

Client contribution cost model: The cost that client incurs is given by the cost of acquiring data and the cost of labeling them. The cost of acquiring data scales linearly with the number or samples according to a parameter . Since not all samples are equally dificult to label, the cost of labeling an entire dataset is not linear to the target accuracy level [30]. The cost of labeling each sample is modeled as proportional to its dificulty. Assuming a maximum entropy distribution of the dificulty of the samples bounded in the [, ] range (i.e., an uniform distribution), the cost of labeling a fraction of the samples, starting from the easiest, is proportional to definite the integral of the cumulative distribution function of the dificulty of the samples up to the -th quantile: ∫︀(− )+ ∫︀ 1 = −2 2.

( 2 ) Data Fitted Curve 500

Since randomly assigning labels for a binary classification task yields 50% label accuracy, to ensure that a fraction of samples have the correct label, the client will have to provide labels for a fraction of the samples equal to at least 2 − 1, incurring in a cost proportional to 2( − )( − 1/2)2. The resulting behavior can be interpreted as the client determining an upper bound on the dificulty of the samples they are willing to analyze. In practice, this means that at least a 2 − 1 fraction of the samples undergoes costly manual annotation, while the remaining samples are labeled with an unreliable automatic procedure with negligible cost. The final cost of labeling a dataset of size || up to a target accuracy of is thus given by || ( − 1/2)2, with being a proportionality constant that depends on the maximum dificulty of the samples and the client’s labeling capabilities. Thus, the overall participation cost for client is given by Eq. ( 3 ):

(||, ) = ||( + ( − 1/2)2).

Since the clients aim to maximize their utility, they will strive to choose the dataset size and the target accuracy that minimize the cost for any given target quality index . By fixing , Eq. ( 1 ) can be used to express the cost only in terms of the dataset size ||: (||)| = || + 2 .

4|| The cost function is minimized for || = √︀ /(4 ), where the first derivative of is equal to zero and the second derivative is positive. The corresponding fraction of accurately labeled samples can be obtained by substituting the optimal || in Eq. ( 1 ) and solving for given a target quality index, resulting in = 21 + √ 1/ . In the following, it is assumed that 4 ≤ so that √ 1/ ≤ 21 , as a labeling accuracy above 100% is meaningless. In scenarios where this condition is not met, i.e. labeling one sample is cheaper than collecting four, the label accuracy is set as the maximum possible value = 1. By substituting this value of in Eq. ( 1 ) the dataset size to achieve a target quality index is || = . Fig. 2 shows the cost of a unitary increase in the quality index as the labeling cost increases compared to the collection cost. If all clients were required to achieve a quality index of 1, the cost would increase linearly. Instead, through the proposed dynamic tuning of the quality parameters, the same quality index can be achieved with a sublinear dependency on the labeling cost. 0.00012 0.00010 c( ) 0.00008 0.00006 0.00004 0.00002 dynamic = 1 0 20 40 60 80

100 / ( 3 ) (4) (5)

Optimal client contribution: Given the contribution of a client, its utility is defined as the diference between the reward obtained and the cost incurred, as shown in Eq. (5), with being the total reward given by the server and = ∑︀

=1 the total contribution of all clients: ( ) =

− ( ).

To find the optimal quality index of the i-th client, the derivative of the utility function with respect to is set as equal to zero. Since can be rewritten as = + − , where − is the sum of the contributions of clients other than , i.e., − = ∑︀̸= , it is possible to isolate the contribution of √ = 0, satisfied by = − ︁( √︁

− √ − 1 ︁) if √ > − . (6) Intuitively, the condition in Eq. (6) means that the -th client should only participate if the current expected reward per unit of contribution is greater than its unitary contribution cost, otherwise the expected utility would be negative. This is illustrated in Fig. 3, where the cost for a client to achieve a unitary quality index is plotted against the average reward per unit of contribution from the other clients across multiple client interactions. For all the interactions where the unit reward is lower than the unit cost, the client abstains from participating. individual clients: 2.6 t so2.5 C ld2.4 o sh2.3 e r Th2.2 2.1 1e 5 0 5 10 20

25 15

Interaction

since the clients are not cooperating, the -th client does not have direct access to this information and must estimate it. When the system converges, each client can estimate the total contribution of the other clients by observing the reward received compared to the total reward in the previous interaction.

For the initial estimate, the client has no information about the other clients, so it assumes that all clients are homogeneous and adopt the same policy to determine their quality index. In this case, in each client’s formulation, − = ( − maximizing will be given by Eq. (7): 1) and ( ) = (− 1) − ( ), hence and the quality index

3.2. Nash Equilibrium

= 2√

Theorem 1. At least one Nash Equilibrium exists for the client’s utility function in Eq. (5). Proof. For a Nash Equilibrium to exist, the utility function of the client must be concave with respect to the client’s strategy, i.e., the choice of quality index [31]. The first and second derivative of the utility function with respect to are computed to prove the concavity: =

− ( + − )2 − √︀ and 22 = − ( + − )3 2 − .

Since second derivative of ( ) with respect to , given by Eq. (8), is negative for all the admissible (positive) values of , the utility function is concave.

Threshold Cost Unitary contribution cost (7) (8) 1. Positivity: ( ) > 0. 2. Monotonicity: ≥ 3. Scalability: ∀ > 1, ( )

≥ contribution performed by all the other clients, the -th client also increases its contribution. ′→− ( ) ≥

( ′) ; this means that, in reaction to an increase in ( ), when all clients uniformly scale up their contribution, the -th client’s response function will increase less than linearly.

Lemma 1.

√︁ √ − 2√︀ − > 0 always holds if the following condition holds: =1 ∑︁ √︀ > 2( − 1)√︀ .

Proof. By setting the first derivative of the client’s utility function in Eq. (6) to zero, the following relationship between the quality indexes of the clients can be obtained:

In order to ensure that the clients will converge to the Nash equilibrium, there must be a unique equilibrium point. Let ( ) be the response function representing the optimal quality index of client given the quality indexes of the other clients, following Eq. (6). If ( ) is a standard function then the proposed game has a unique Nash equilibrium [32].

Definition 1.

A function ( ) is standard if, for all ≥ 0, the following properties are satisfied: (9) (10) (11) (12) (13) ︁) (14) − ( − + )2 = 1 √︀ . ∑︁ =1

− ( − + )2 = ∑︁ 1 √︀ .

=1 Summing up on both sides of Eq. (10) for all clients yields the relationship in Eq. (11): Hence, the total contribution in terms of the costs of all the clients is rewritten as Eq. (12): =

By substituting Eq. (12) into Eq. (10), it is possible to derive that 2( − 1)2√︀ = − ⎝

∑︁ √︀ ⎠ . ⎛ =1 ⎞2 √︀√ ( − 1) > 2√︀ − ( − 1)√ , which simplifies to √︁ √ − 2√︀ − > 0 If Eq. (9) holds, by computing the square root of both sides of the equation, it is possible to prove that Theorem 2. The client response function is standard, and the proposed game has a unique Nash Equilibrium if the condition in Eq. (9) holds.

Proof. The positivity is obviously satisfied by the client response function under the condition in Eq. (6). • The client response function is monotone for > 0 if Eq.(9) holds, i.e., ≥ →′− ( ) − ( ′) ≥ 0: ( ) − ( ′) = √︁ √ (√︀ − − √︁ ′ )

− − ( − − −′ ) = (√︀ − − √︁ −′ ) ︁( √︁ √ − (√︀ − +

︁) √︁ −′ ) ≥ (√︀ − − √︁ −′ ) ︁( √︁ √ − 2√︀ − , which is positive according to Lemma 1. • The client response function is scalable: ∀ > 1, ( ) − ( ) ≥ 0: ( ) − ( ) = √︃ − √ − − −

√︃ − + − = (√ − 1) √ √︃ − √

. Since > 1, both terms are positive, thus ( ) ≥ ( ′) and the response function is scalable.

Hence the client response function ( ) is a standard function which, as demonstrated by Deligiannis et al. [32], is a suficient condition for the existence of a unique Nash Equilibrium.

4. Server reward policy

The server aims to maximize its utility by setting an appropriate reward. The utility, denoted as , is defined as the diference between the satisfaction derived from the aggregated global model and the total reward distributed to the followers. The server’s satisfaction depends on the accuracy of the new global model, which grows concavely as the total contribution of the followers increases. To represent this dependency, a logarithmic function is employed to describe the relationship between the model’s accuracy and the followers’ quality indexes: The server can then find the optimal reward * by maximizing the utility function (). Since the utility function is concave (negative second derivative), the optimal reward can be found by calculating the first derivative, shown in Eq. (19), and setting it equal to zero.

= (− 1) √︁

(− 1) ∑︀=1 √︂ − ,(− 1) − (− 2) 2 (− 1) √︁ (− 1) ∑︀=1 √︁ − − ,(− 2) + 2

For the sake of conciseness, the following parameters are used: = contribution in the previous round, and = ∑︀=1 √︂

− ,(− 1) − (− 2)(− 1)

(− 1) , which is the average . Eq. (19) is thus rewritten as √ the server to maximize its utility: 2 √+2 − 1, which has a unique positive solution represented in Eq. (20), chosen as a reward by The Stackelberg equilibrium can be found by solving the above non-linear optimization problem. Let * be unique Nash equilibrium obtained by the clients when the server ofers a reward R. The server has to maximize Eq. (16) a priori to find its unique optimal reward * and announce it to the clients. Given a reward , each client sets their contribution * according to Eq. (6). The server, however, does not have direct access to the parameters of the clients’ cost function, i.e., and . Instead, assuming that the costs have not changed significantly, the server can estimate the unitary contribution cost of client , that is √ and ( − 2) subscripts respectively, through Eq. (17).

, from the behavior of the clients in the previous two rounds, denoted with the ( − 1) √︀ =

(− 1) − ,(− 2) ︀( ,(− 1) + − ,(− 2) ︀) 2

Hence, the server can estimate the expected utility given a reward solely by relying on the previous interactions by substituting Eq. (6) into Eq. (16), with the cost estimated by Eq. (17): ︃( () = log 1 + ∑︁ =1

)︃ () − . ︃( () = log 1 + (− 1) √︃ ∑︁ √︃ − ,(− 1)

)︃ (− 1) =1 − (− 2) − .

(15) (16) (17) (18) (19) * =

+ 2

√︁ (− 1) + (− 1) ︀( 2 222 + (− 1)

︀) .

Since in the first interaction the server has no information about the clients, it assumes that all clients are homogeneous and utilizes an estimate of the average costs instead. The utility is thus formulated (20) − , which is concave with respect to and can easily be ︂( as () = log 1 + ∑︀ maximized analytically.

=1 2√ (− 1) ̃︁

︂)

5. Experimental evaluation

To assess the proposed approach, a multi-layer perceptron was considered as model to be learned. The data sampled by clients are derived from the KronoDroid dataset [29], a publicly available realworld dataset covering the period from 2008 to 2020. The KronoDroid dataset, consisting of 28,745 Android malware samples and 35,256 benign samples, sorted according to their “last modification” date, provides a rich and dynamic environment for assessing the system’s capabilities in maintaining model performances over multiple years. The dataset was divided into 26 non-overlapping subsets covering three months each, from 2012 to 2018. After each three-month period, the server announces a reward to the ten clients, who then decide their contribution efort acquiring samples from the current subset and labeling them. The model accuracy and the server’s utility are computed according to the total client contribution.

0.82 and provide fair and consistent rewards, the mathematical model was validated by comparing the theoretical utility expected by the server with the actual utility (Fig. 4), and the reward obtained by the clients with their provided quality index (Fig. 5). As a general trend, clients with lower data acquisition costs achieve higher quality indexes, enabling them to obtain higher rewards. Additionally, if all clients have high costs, the server will obtain lower utility. The linear relationship in these plots confirms that the server estimates are consistent with the actual outcomes, and that the clients receive consistent and fair rewards for their contributions. The diference in reward outcome for heterogeneous clients is further highlighted in Fig. 6, where it is possible to observe that lower data acquisition costs lead clients to provide higher quality indexes while incurring in lower costs, thus achieving a reward inversely proportional to their unitary cost.

Average client utility

Efectiveness over time: The model’s accuracy and the server’s utility were evaluated over multiple interactions to assess the system’s ability to maintain the model’s performance over time. Each client’s contribution converges over the interactions to a value that depends on their data acquisition and labeling costs, as shown in Fig. 7. As can be seen in Fig. 8, the model’s accuracy remains stable over time, despite the concept drift in the dataset, thanks to the clients’ contributions. If acquiring labeled data is too expensive the performance of the model will settle on a lower accuracy (∼ 86%), while if the clients can provide high-quality data at a lower cost, the model will achieve a higher accuracy (∼ 92%). These values are consistent with state-of-the-art results with this dataset when concept drift is efectively managed [ 33], thus showing the feasibility of using an incentive mechanism for performing data collection through crowdsourcing to maintain the efectiveness of malware detection models over time. 1e 5

t 2.2 s o c 2.1 n o i 2.0 t u b i 1.9 tr

n 1.8 c o y 1.7 r a t 1.6 in

6. Conclusions

It has been widely demonstrated that machine learning models are efective at detecting malware, but lose efectiveness over time due to concept drift. To maintain the efectiveness of the model, it is necessary to continuously update the model with new data. Distributed computing paradigms such as Federated Learning can be adopted to directly involve users in the data collection process required to maintain the model up-to-date. However, without adequate incentives, the clients may not be willing to provide high-quality data, leading to model degradation. This work proposes a Stackelberg game model for Federated Learning to incentivize the clients to collect high-quality data, enabling the long-term feasibility of employing machine learning models for malware detection despite the concept drift issue. Clients can decide how much data to collect and how much efort to put into labeling it to maximize their net utility, while the server decides the overall reward to maximize the contribution of the clients. The Stackelberg game was shown to possess a stable Nash equilibrium, and the model was experimentally shown to be capable of preserving the efectiveness of the model over multiple years. In the current version of the system, the server relies on honest clients to provide accurate quality indexes. Future work will investigate strategies that allow the server to infer the quality indexes of the clients, and meta-learning strategies to extract more value from poorly-annotated data.

Acknowledgements

This work was partially supported by the AMELIS project, within the project FAIR (PE0000013), and by the ADELE project, within the project SERICS (PE00000014), both under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU.

Declaration on Generative AI

The authors have not employed any Generative AI tools. [4] J. Singh, J. Singh, A survey on machine learning-based malware detection in executable files,

Journal of Systems Architecture 112 (2021) 101861. [5] A. H. Salem, S. M. Azzam, O. Emam, A. A. Abohany, Advancing cybersecurity: a comprehensive review of ai-driven detection techniques, Journal of Big Data 11 (2024) 105. [6] D. Gibert, C. Mateu, J. Planes, The rise of machine learning for detection and classification of malware: Research developments, trends and challenges, Journal of Network and Computer Applications 153 (2020) 102526. [7] M. Ahsan, K. E. Nygard, R. Gomes, M. M. Chowdhury, N. Rifat, J. F. Connolly, Cybersecurity threats and their mitigation approaches using machine learning—a review, Journal of Cybersecurity and Privacy 2 (2022) 527–555. [8] A. B. Nassif, M. A. Talib, Q. Nasir, F. M. Dakalbab, Machine learning for anomaly detection: A systematic review, IEEE Access 9 (2021) 78658–78700. [9] V. Agate, A. De Paola, S. Drago, P. Ferraro, G. L. Re, Enhancing iot network security with concept drift-aware unsupervised threat detection, in: 2024 IEEE Symposium on Computers and Communications (ISCC), IEEE, 2024, pp. 1–6. [10] F. Barbero, F. Pendlebury, F. Pierazzi, L. Cavallaro, Transcending transcend: Revisiting malware classification in the presence of concept drift, in: 2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, pp. 805–823. [11] D. W. Fernando, N. Komninos, Fesad ransomware detection framework with machine learning using adaption to concept drift, Computers & Security 137 (2024) 103629. [12] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro, {TESSERACT}: Eliminating experimental bias in malware classification across space and time, in: 28th USENIX security symposium (USENIX Security 19), 2019, pp. 729–746. [13] A. Augello, A. De Paola, G. Lo Re, Hybrid multilevel detection of mobile devices malware under concept drift, Journal of Network and Systems Management 33 (2025) 36. doi:10.1007/ s10922-025-09906-3. [14] X. Deng, Z. Wang, X. Pei, K. Xue, Transmalde: An efective transformer based hierarchical framework for iot malware detection, IEEE Transactions on Network Science and Engineering (2023). [15] V. Rey, P. M. S. Sánchez, A. H. Celdrán, G. Bovet, Federated learning for malware detection in iot devices, Computer Networks 204 (2022) 108693. [16] A. Augello, A. De Paola, G. Lo Re, M2FD: Mobile malware federated detection under concept drift,

Computers & Security (2025) 104361. doi:10.1016/j.cose.2025.104361. [17] S. Seneviratne, A. Seneviratne, P. Mohapatra, A. Mahanti, Predicting user traits from a snapshot of apps installed on a smartphone, ACM SIGMOBILE Mobile Computing and Communications Review 18 (2014) 1–8. [18] Y. Tong, Y. Wang, D. Shi, Federated learning in the lens of crowdsourcing., IEEE Data Eng. Bull.

43 (2020) 26–36. [19] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y. Arcas, Communication-Eficient Learning of Deep Networks from Decentralized Data, in: A. Singh, J. Zhu (Eds.), Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 1273–1282. URL: https://proceedings.mlr.press/v54/ mcmahan17a.html. [20] A. Augello, A. Gupta, G. Lo Re, S. K. Das, Tackling selfish clients in federated learning, in: 27th European Conference on Artificial Intelligence (ECAI 2024), IOS Press, 2024, pp. 1888–1895. doi:10.3233/FAIA240702. [21] Y. Zhan, J. Zhang, Z. Hong, L. Wu, P. Li, S. Guo, A survey of incentive mechanism design for federated learning, IEEE Transactions on Emerging Topics in Computing 10 (2021) 1035–1044. [22] R. Zeng, C. Zeng, X. Wang, B. Li, X. Chu, Incentive mechanisms in federated learning and a game-theoretical approach, IEEE Network 36 (2022) 229–235. [23] K. Donahue, J. Kleinberg, Optimality and stability in federated learning: A game-theoretic approach, Advances in Neural Information Processing Systems 34 (2021) 1287–1298. [24] S. Jiang, J. Wu, A reward response game in the federated learning system, in: 2021 IEEE 18th

International Conference on Mobile Ad Hoc and Smart Systems (MASS), IEEE, 2021, pp. 127–135. [25] Y. Zhan, P. Li, Z. Qu, D. Zeng, S. Guo, A learning-based incentive mechanism for federated learning,

IEEE Internet of Things Journal 7 (2020) 6360–6368. [26] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-eficient learning of deep networks from decentralized data, in: Artificial intelligence and statistics, PMLR, 2017, pp. 1273–1282. [27] T. Roughgarden, Stackelberg scheduling strategies, in: Proceedings of the thirty-third annual

ACM symposium on Theory of computing, 2001, pp. 104–113. [28] R. Gao, M. Saar-Tsechansky, Cost-accuracy aware adaptive labeling for active learning, in:

Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 2569–2576. [29] A. Guerra-Manzanares, H. Bahsi, S. Nõmm, Kronodroid: time-based hybrid-featured dataset for efective android malware detection and characterization, Computers & Security 110 (2021) 102399. [30] S. Vijayanarasimhan, K. Grauman, What’s it going to cost you?: Predicting efort vs. informativeness for multi-label image annotations, in: 2009 IEEE conference on computer vision and pattern recognition, IEEE, 2009, pp. 2262–2269. [31] J. B. Rosen, Existence and uniqueness of equilibrium points for concave n-person games, Econometrica: Journal of the Econometric Society (1965) 520–534. [32] A. Deligiannis, A. Panoui, S. Lambotharan, J. A. Chambers, Game-theoretic power allocation and the nash equilibrium analysis for a multistatic mimo radar network, IEEE Transactions on Signal Processing 65 (2017) 6397–6408. [33] A. Guerra-Manzanares, M. Luckner, H. Bahsi, Android malware concept drift using system calls: detection, characterization and challenges, Expert Systems with Applications 206 (2022) 117200.

[1]

Adat ,

B. B.

Gupta , Security in internet of things: issues, challenges, taxonomy, and architecture , Telecommunication Systems 67 ( 2018 ) 423 - 441 .

[2]

Moser ,

Kruegel , E. Kirda, Limits of static analysis for malware detection, in: Twenty-third annual computer security applications conference (ACSAC 2007) , IEEE, 2007 , pp. 421 - 430 .

[3]

Catalano ,

Chezzi ,

Angelelli ,

Tommasi , Deceiving ai-based malware detection through polymorphic attacks , Computers in Industry 143 ( 2022 ) 103751 .