1. Introduction

February

HDDroid: Federated Hyperdimensional Computing for Mobile Malware Detection

Andrea Augello

Alessandra De Paola

Giuseppe Lo Re

Antoine Menager

1 0 Department of Engineering, University of Palermo , Italy 1 Polytech Dijon , France

2025

0 3 8

Mobile malware poses significant security and privacy risks, hence efective detection methods are crucial. Graphbased representations of mobile applications have been shown to be well-suited for this task. However, traditional graph-based machine learning techniques are computationally expensive and unsuitable for on-device analysis. Nevertheless, of-device analysis raises privacy concerns, making on-device analysis combined with decentralized learning approaches like Federated Learning (FL) an attractive alternative. Hyperdimensional Computing (HDC) ofers eficient graph classification on resource-constrained mobile devices. This work introduces HDDroid, an FL framework leveraging HDC to detect malicious software via function call graph analysis. HDDroid's novel online encoding strategy reduces memory usage, enabling large graph analysis on mobile devices. Additionally, HDDroid's improved model aggregation strategy enhances model robustness and classification accuracy, achieving state-of-the-art performance in distributed learning scenarios.

eol>Federated Learning Malware detection Hyperdimensional Computing

1. Introduction

The widespread difusion of mobile devices, and the consequent increase in the number of applications available on those devices, has led mobile malware to become a significant threat to the security and privacy of users [ 1 ], thus making efective detection methods crucial to limit these risks. Raw features like bytecodes, opcodes, strings, permissions, and APIs are easy to extract and can be used to identify malware [ 2 ]. However, these low-level features are vulnerable to code obfuscation, considered that malware developers alter code features while maintaining functionality [ 3 ]. Higher-level structured features, such as multimodal features, API call reachability analysis, and function call graphs, are preferred as they are harder to modify while preserving the program’s semantics [ 4 ]. Graph-based representations in particular efectively capture the semantics of the code and can be extracted by lightweight static analysis [ 4 ].

An additional challenge in mobile malware detection is the privacy of the users: a simple snapshot of the installed applications on a device is suficient to infer sensitive information such as the users’ location, interests, religious beliefs, marital status, medical conditions, and habits [ 5 ]. Hence, to preserve the privacy of the users, data should never leave their devices. Although graph-based representations can be obtained easily without requiring of-device dynamic analysis, traditional machine learning techniques for graph classification are computationally expensive and require multiple passes through the data. Thus, mobile devices might struggle to perform the necessary computations for on-device analysis, and real-time detection would not be feasible [ 6 ]. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that is recently gaining popularity in the machine learning community [7]. This paradigm is well suited for devices with limited energy and computational resources such as mobile phones and represents a promising alternative for on-device training and inference [7].

This work introduces HDDroid, a federated learning framework for mobile malware detection to eficiently and efectively detect malicious applications through the analysis of function call graphs. The main contributions of this work can be summarized as follows: • HDDroid enables on-device training and analysis leveraging Hyperdimensional Computing; • through online encoding, HDDroid reduces the memory requirements for function call graphs encoding, enabling the analysis of large graphs on resource-constrained devices; • to integrate the contributions of multiple users, HDDroid employs a novel aggregation strategy for distributed learning that filters out irrelevant information and increases the classification accuracy.

Experimental results on the comprehensive real-world publicly available MalNet [8] dataset show that HDDroid achieves state-of-the-art performance in mobile malware detection, and maintains a high accuracy in distributed learning scenarios. The remainder of this paper is organized as follows. Section 2 reviews related works. Section 3 provides preliminary information on the adopted computing paradigms. Section 4 details the HDDroid architecture. Section 5 presents the experimental evaluation. Finally, Section 6 concludes the paper.

2. Related Works

The field of malware detection has been the subject of extensive research. The adoption of Machine Learning methods promises to be the best approach to enable the design of solutions that can automatically learn how to detect these types of threats [9]. Early automated malware detection systems detected malware by computing a signature summarizing the characteristics of the software and comparing it to known malware signatures. The signature of a software is constituted by a unique string of bytes generated through an analysis of the code. Several approaches to generate efective software signatures for malware detection have been proposed [10]. For instance, early approaches used a digest of the ifle to check if an executable is a known malware. More fine-grained approaches statically analyze the execution flow of the program trying to match sequences of operation in the code with known malware patterns. Signature-based approaches are fast and efective in detecting known malware but they are not reliable for detecting unknown malware types and can be eluded by polymorphic malware. Additionally, signatures need to be crafted through careful feature engineering, limiting versatility. A more flexible approach compared to signature-based detection is behavior-based classification. These techniques analyze the behavior of the software during its execution by monitoring system calls, file changes or network activity and build a feature vector that is then used to classify the software. This approach requires sandbox execution, which is computationally expensive for on-device analysis [11]. Thus, static analysis is more suitable for on-device feature extraction.

However, detection through low-level static features such as APIs, strings, permissions, and opcodes can be eluded through code obfuscation [ 4 ] since malware developers can alter the code to change its features while maintaining its functionality. Therefore, higher-level structured features are preferred. For instance, graph representations of software are robust to obfuscation [ 4 ] while being relatively easy to acquire from the code with a static analysis without requiring computationally expensive dynamic analysis [12]. Among the most used graph representations for malware analysis are: a) Control Flow Graphs [13] that model paths between blocks of code during the program execution; b) Function call and API dependencies graphs [14], which are Control Flow Graphs with the functions/API calls granularity; c) System call graphs [15] that model interactions between the program and the underlying operating system; d) System entity graphs [16] that model interactions between the program and the system entities such as files, network connections, and processes. Function call graphs extracted through a static analysis of the API calls are the most popular choice for Android malware detection [14].

Traditional approaches to represent graphs for learning tasks often rely on hand-crafted summary statistics to encode the graph structure into feature vectors. For example, LDP [17] uses histograms of node degrees and their 1-hop neighbors, while NoG [18] focuses on node and edge attributes, ignoring topology. Other methods use random walks to describe node neighbors [19]. Graph kernels, such as Slaq-LSD and Slaq-VNGE [20], measure approximate distances between spectral graph representations. Recent works have focused on learning graph representations directly from raw graph data using deep learning models. Graph Neural Networks (GNNs) propagate information between neighboring nodes [21], learn the optimal aggregation function [22], and extending convolution operations to graph data [23]. While efective, these methods are computationally expensive and unsuitable for on-device analysis in constrained devices like mobile phones. A recent alternative for graph embedding is Hyperdimensional Computing, specifically the GraphHD algorithm [ 24]. Rather than learning a representation of the nodes, GraphHD assigns random high-dimensional vectors to nodes and combines them to represent edges and the entire graph. This approach is suitable for on-device analysis, as it is more computationally eficient with shorter training times compared to traditional GNNs.

3. Preliminaries 3.1. Federated Learning

Federated Learning (FL) is a decentralized machine learning approach that lets multiple independent data owners to contribute to a shared model without disclosing their data [25]. In FL, each participating client (data owner) has a local, private, dataset. A central server distributes a copy of a classification model to all clients. Each client trains the model on its local data for a set number of iterations. The clients then send the updated models to the server, which aggregates them into a global model through a weighted average. The new model is then redistributed to the clients for further refining, repeating this process for multiple rounds until model convergence. FL preserves the privacy of the clients’ data, sharing only model parameters and keeping the data private, making it an attractive alternative to centralized learning for malware detection tasks [26].

3.2. Hyperdimensional Computing

Hyperdimensional computing (HDC), described in-depth in [27], is a brain-inspired computing paradigm that leverages high-dimensional vectors to perform computations. HDC performs classification tasks by representing data as hypervectors (HVs), whose dimensionality is typically in the order of tens of thousands, and associating these with class labels. An hypervector ℋ is defined as ⟨ℎ1, ℎ2, . . . , ℎ⟩ where ℎ is the value of the -th dimension of the vector. The size of the vectors is consistent throughout a HDC system: operations act on HVs of homogeneous length and do not modify their dimension. HDC is scalable, parallelizable, energy-eficient, and well-suited for few-shot learning tasks, requiring less training and inference time than neural networks of comparable accuracy [7]. Operations A useful property of random vectors in high-dimensional spaces is their quasiorthogonality [27] (their cosine similarity is close to 0), which allows multiple HVs to be combined through simple mathematical operations while maintaining the information encoded in each HV. Three major operations are supported by HVs and used in HDC: • Permutation (Π) is an unary operation which rearranges the elements of the vector into a HV quasiorthogonal to the original one. This operation is often used together with the binding operations to assign an order between HVs, and it typically implemented through a circular shift, moving all the coordinates of the HV clockwise:

Π(ℋ) = ⟨ℎ, ℎ1, . . . , ℎ− 1⟩. • Binding (⊛) two HVs yields an HV dissimilar to both. Binding associates information from two diferent HVs (e.g., assigning values to variables) through element-wise multiplication: ℋ1 ⊛ ℋ2 = ⟨ℎ1,1 × ℎ2,1, . . . , ℎ1, × ℎ2,⟩. • Bundling (⊕ ) combines two HVs into a new hypervector through element-wise addition, obtaining a HV similar to both the input HVs:

ℋ1 ⊕ ℋ 2 = ⟨ℎ1,1 + ℎ2,1, . . . , ℎ1, + ℎ2,⟩.

The high dimensionality allows these operations produce distinct and non-conflicting HVs. The operators act on HV elements independently and do not require loading the entire HV into memory, making them highly parallelizable and suitable for analysis on both resource-constrained devices such as low-end mobile phones and devices capable of parallel processing.

Training and classification In HDC, a classification task with classes is performed by computing a class prototype HV for each class 1, . . . , , storing them into an associative memory ℳ, and then comparing query HVs with the class prototypes to determine the predicted class.

Class HVs can be computed by bundling the HVs in the training set belonging to the corresponding class. Single-pass training is simple and eficient, but often leads to low accuracy, thus iterative training algorithms such as RefineHD [ 28] and OnlineHD [29] have been proposed to improve the classification performance. These algorithms bundle incorrectly classified query HVs to the correct class HV and subtract them from the wrongly predicted class HV.

At inference time, the similarities of a query hypervector ℋ and the class hypervectors in the associative memory ℳ are computed. The class whose HV has the highest cosine similarity with the query is considered to be the predicted label: (ℳ, ℋ) = arg max (, ℋ) = arg max

·ℋ ||||×||ℋ || .

4. HDDroid architecture

HDDroid leverages a collaborative learning framework to allow multiple independent data owners to contribute to a shared mobile malware detection model without disclosing their data. The system includes clients, each with its own private dataset, and a central server that coordinates training. Each client uses its local data to train a hyperdimensional associative memory ℳ() for classification and sends it to the central server. The server aggregates the classifiers into a global classifier ℳ() and sends it back to the clients for further refinement over several iterations. A schematic representation of the HDDroid architecture is shown in Figure 1.

Local executable dataset Local executable dataset Local executable dataset

Client 1 M(1) s i s y l a n A c i t a t S s i s y l itcaanA ilssayn

FuntSction cAall graphs

c i t a FuntSction call graphs . .ign. ⊕ M(...) ed Cliehn0th1nh2 h3 h4 h5 ... hD d raphbEGm ireaghpbddnEGmhh0h00hh1Hh11hyh2iregaddbnphEGmh0p2h02hhe0h30rh13hv13Hhh1eh41h2cy4h24thh0p2hho052heh305rh305r.hs13hhv..13Hhh.4.1.eh4.1.hh2cy4.hh2D4thph52Dho52Deh35r.h35r..hs3hv...3h.4..ehh4..hhcD4.hD4th5Dho5Dh5r.5..s......h..hD.hDhDD ⊕ M(n)

Function call graphs

M(G) Σ

Server Aggregation weights

4.1. Mobile application encoding

HDDroid analyzes mobile applications using function call graphs obtained through static code analysis. Each function is a graph node with a unique identifier, and directed edges represent function calls, indicating possible execution paths. Hyperdimensional Computing directly encodes function call graphs of mobile applications into HVs for classification, without the need for feature engineering. The encoding process is hierarchical: vertices are encoded first, then they are combined into edges HVs, and edges are bundled to encode the entire graph.

0 1 2

Graph A

3 4

Node encodings B

Node encoding The first step in the encoding process is to assign a unique hypervector to each graph node. Since the function identifiers in an application are arbitrary, they cannot be encoded directly into HVs. To ensure consistent encoding across graphs, each node is labeled with its PageRank centrality [30], as proposed in [24]. PageRank measures a node’s importance based on the stationary distribution of a random walk on the graph. Once the nodes receive a meaningful identifier consistent across diferent graphs, each identifier is mapped to an HV.

A common approach to encode discrete values into HVs is to precompute a set of random HVs, one for each possible value. However, this requires an HV for each value, causing the random basis size to grow linearly with the number of values, which can be impractical. For application call graphs, the number of possible function calls is unbounded, with some graphs containing almost 600 thousand nodes [8]. Encoding all nodes in HVs with dimensionality 10000 would require over 40 GB of memory, making it infeasible for resource-constrained mobile devices. Additionally, a fixed random base requires knowing the range of values to be encoded, which is impractical for function call graphs due to the arbitrary number of function calls in mobile applications. Thus, using a random basis is not viable in this target scenario.

Other approaches, such as Flocet encoding [28] and random projections [31], embed discrete values into HVs without precomputing a random basis. However, these methods preserve similarity between original values, which is undesirable here as consecutively ranked nodes correspond to diferent functions. To address these issues, HDDroid uses an online encoding strategy based on the Philox algorithm [32], a counter-based pseudo-random number generator. A digest is computed from the node label to seed the key and counter for Philox, which generates an arbitrary-length sequence of pseudorandom values as the HV encoding. This deterministic procedure ensures the same label is always encoded as the same hypervector, without storing a lookup table, and guarantees quasi-orthogonal HVs. Additionally, Philox can generate the values in parallel, making it suitable for encoding large graphs on multi-core devices.

Hypervector-based graph representation Given the HVs for all the nodes, they can be combined to compute the HV for the whole graph [24] as illustrated in Figure 2. Each edge (, ) in the graph is encoded by binding (element-wise multiplication) the hypervectors ℋ and ℋ of the two nodes and , with the destination node HV permuted to indicate the direction of the edge. All the edges are then bundled (element-wise addition) into the graph hypervector ℋgraph as per Eq. 1. Each element in the graph HV is thus given by the sum, for each pair of connected nodes, of the products of the corresponding elements in the node HVs, with the elements in the arrival node HV rotated by one position: ℋgraph = ⨁︁ ℋ ⊛ Π(ℋ) = ⟨ ∑︁ ℎ,1 × ℎ,, ∑︁ ℎ,2 × ℎ,1, . . . , ∑︁ ℎ, × ℎ,− 1⟩. (,)∈edges (,)∈edges (,)∈edges (,)∈edges (1) Since all node HVs have the same dimensionality, the graph HV will have a fixed size regardless of the number of function calls. However, graphs with more edges will sum more quasi-orthogonal edge HVs, potentially increasing the vector magnitude. To ensure fair influence on the final classifier, graph hypervectors are normalized to have the same magnitude, regardless of the application size. The complete encoding algorithm is reported in Algorithm 1. At any given step of the algorithm, the server only needs to record at most three hypervectors: ℋℎ, ℋ and either ℋ or ℋ,. Excluding the PageRank computation, the memory requirement of the graph encoding algorithm is (), independent of the graph size, while the time complexity is (| | · ). The graph information is spread across the HV, no HV element stores specific information about a node or an edge, granting robustness to noise and errors.

Algorithm 1 Application Function Call Graph Encoding Algorithm Require: Application, Hypervector size Ensure: Graph hypervector ℋgraph 1: Graph = (, ) = call_graph(Application) 2: Initialize ℋgraph as an empty hypervector 3: r ← { pagerank(, ) | ∈ } 4: for ∈ do 5: ← hash(r) 6: , ← lower_bits(), upper_bits() 7: ℋ ← Philox(, , ) 8: for ∈ | (, ) ∈ do 9: ← hash(r) 10: , ← lower_bits(), upper_bits() 11: ℋ ← Philox(, , ) 12: ℋ, ← ℋ ⊛ Π(ℋ) 13: ℋgraph ← ℋ graph ⊕ ℋ , 14: return |ℋℋggrraapphh| {Compute unique ID for each node}

{Generate start node HV } {Generate destination node HV } {Update graph HV with new edge}

4.2. Few-shot collaborative learning

After encoding the function call graphs into HVs, each client computes a local associative memory ℳ() for classification. Through federated learning, clients share their local classifiers with the central server, which aggregates them into a global classifier ℳ(). This process is repeated for several rounds, refining the global classifier. The HDC paradigm enables eficient distributed learning with minimal training iterations and communication cycles.

Local training As discussed in Section 3, the local training of the classifier is often performed by computing the class hypervectors through one-pass training. However, bundling all HVs of the same class can lead to model saturation, as common patterns overshadow specific characteristics crucial for classification. This is particularly problematic in mobile application classification, where many apps share functionalities, and malware can mimic benign apps to elude detection. Instead of bundling all HVs of the same class, HDDroid adopts the weighted aggregation strategy in [28]. Each sample ℋ is compared to the local associative memory. If the class hypervector most similar to ℋ is not the one for its class , a weighted portion of ℋ is added to and subtracted from the most similar class hypervector to increase and decrease their similarity to ℋ, respectively. The weights depend on the similarity of the HV with the predicted and true label classes, and , as per Eq. (2): = + (1 − )(1 − − )ℋ = − (1 − )(1 − − )ℋ. (2) When the sample has a high similarity with the correct class, only a small portion of the hypervector is added to the class hypervector. When the diference is significant, a larger portion of the sample hypervector is added to the correct class hypervector and subtracted from the incorrect one to better incorporate the sample’s information. HVs with correct, but low confidence, predictions are still added to the correct class hypervector. In this case, the weight of the hypervector is computed as = + (1 − − )ℋ. A prediction is considered to have low confidence if the similarity with the correct class hypervector is below the average similarity of the wrongly classified samples with the correct class hypervector.

Model aggregation Once clients update their associative memories, they send the class HVs to the central server for aggregation into a global classifier [ 25]. This process leverages information from all clients, creating a more accurate model while reducing communication overhead and preserving client privacy compared to transmitting the raw HVs. Typically, HDC aggregates local classifiers by averaging class HVs from each client, similar to traditional federated learning. A representative example is Fed-HD [33], which computes the global classifier as ℳ() = ℳ()+∑+︀1=1 ℳ() .

However, this aggregation process can lead to model saturation by repeatedly adding similar information from all clients. HDDroid weights class HVs from each client by the distinctiveness of their information, avoiding redundancy and improving learning outcomes. Given the class hypervector from client , (), the server evaluates the intra-class redundancy (, ), which measures the average () with the hypervectors of other clients for the same class: similarity of (, ) = 1 ∑︀=1 ((), ()).

() is similar to other clients’ hypervectors for the same class, High intra-class redundancy means adding little new information. The server also evaluates the inter-class confusion (, ) of client for class by comparing () to hypervectors from other classes:

(, ) = 1 ∑︀=1 max ((), ℳ() ∖ ()), where ℳ() ∖ () is the set of class HVs in the associative memory of client excluding (). High () contains information not specific to class . inter-class confusion indicates that

The computation of and for all class hypervectors has a complexity of (2 · 2 · ), where is the number of clients, is the number of classes, and is the hypervector dimensionality. Since and are hyperparameters, the aggregation process scales quadratically with the number of clients. These computations are independent and can be parallelized for eficiency.

The normalized aggregation weight () is computed using the intra-class () for the hypervector redundancy and inter-class confusion as per Eq. (3): () = ∑︀=1 ∑︀=1(1− (,))(1− (,)− (,)) (1− (,))(1− (,)− (,)) .

(3) A hypervector similar to others in the same class or to those of other classes is considered redundant or uninformative, respectively, and its weight is decreased. The class HVs’ weights are used by the server to compute the updated global memory as:

ℳ() = ︁( ∑︀=1 1()1(), . . . , ∑︀=1 ()())︁ .

Unlike traditional FL on neural networks, which requires many communication rounds to achieve good performance [34], HDDroid is experimentally shown to converge in few rounds.

5. Experimental evaluation

To evaluate HDDroid, experiments were conducted on the MalNet tiny 5K [8] and MalNet 61K [35] datasets, containing function call graphs of Android applications labeled as benign or malware. The MalNet tiny 5K dataset contains 5000 applications, each with up to 5000 nodes per graph. The larger 61K dataset includes 87430 applications, with graphs containing up to 599234 nodes. The 61K dataset’s size and complexity make it a better testbed for evaluating HDDroid’s scalability and encoding efectiveness. Most experiments were conducted on the 61K dataset, with the 5K dataset used for comparison with existing methods. In order to simulate a large number of devices, the experiments were conducted on a server with 32 Intel(R) Xeon(R) Gold 6242 CPU cores, 32 GB of RAM, and an NVIDIA A100 GPU. The HDDroid model was implemented using the TorchHD library [36].

Comparison with state-of-the-art approaches: First HDDroid is compared with other non-HDC

existing approaches for graph classification in a centralized setting. Table 1 and Table 2 show the performance of HDDroid alongside state-of-the-art methods reported in the literature on the MalNet tiny 5K and MalNet 61K datasets, respectively.

Method SIR-GN [35] Feather [19] LDP [17] GIN [22] GCN [23] Slaq-LSD [20] NoG [18] Slaq-VNGE [20] HDDroid Accuracy 0.92 0.86 0.86 0.90 0.81 0.76 0.77 0.53 0.90

On the MalNet tiny 5K dataset, HDDroid achieves an accuracy of 0.90, closely matching the best method, SIR-GN, which has an accuracy of 0.92, demonstrating the viability of the HDC paradigm for malware detection. On the larger 61K dataset, HDDroid outperforms the SIR-GN algorithm, demonstrating its scalability and the efectiveness of the HDC paradigm in capturing relevant information in large function call graphs.

Other HDC methods: HDDroid’s training and aggregation strategies are assessed through a comparison with traditional HDC methods. The first compared approach utilizes OnlineHD [ 28] for local training, maintaining HDDroid’s aggregation strategy. For the second, Fed-HD [33] is used for aggregation, keeping HDDroid’s local training strategy. Figure 3 shows the accuracy of HDDroid on HDDroid OnlineHD

Fed-HD 8 0.96 0.94 0.92 0.90 0.88 e0.96 ro0.94 c –S0.92 1F0.90 0.88 0.96 0.94 0.92 0.90 0.88 D I I 1 . 0 = α 0.2 0.4 0.6 0.8 False Positive Rate 1 the MalNet 61K dataset across training rounds compared with OnlineHD and Fed-HD. OnlineHD starts with high initial accuracy but drops over time, likely due to local training saturation. Fed-HD shows initial improvement but quickly plateaus, possibly due to redundant information in the global classifier. HDDroid, on the other hand, starts with a lower initial accuracy, as the inclusion of new HVs is conservative, but quickly surpasses the other methods, capturing more of the relevant information in the HVs.

To better assess how well the obtained class hypervectors capture the information relative to their classes, the Receiver Operating Characteristic (ROC) curve for HDDroid, OnlineHD, and Fed-HD are compared, plotting the true positive rate against the false positive rate varying the classification threshold. In this analysis, a model that does not rely on the PageRank node ordering is also considered to gauge the impact of this procedure on the quality of the encoding. Figure 4 shows the ROC curve for the tested methods. Removing PageRank ordering significantly reduces model performance, highlighting its importance in encoding. As in the previous analysis, HDDroid outperforms the other approaches, confirming the efectiveness of the training and aggregation strategies.

Robustness to non-IID data distributions: In federated learning settings, clients often have heterogeneous data distributions, which can lead to a decrease in model performance [37]. To evaluate HDDroid’s robustness to non-IID data distributions, the dataset labels are distributed among clients using a Dirichlet distribution. An IID setting and two non-IID settings are tested: a moderate non-IID setting ( = 1), and a highly non-IID setting ( = 0.1), where malware classes are almost partitioned between clients. As shown in Figure 5, larger hypervector sizes generally improve performance, with diminishing returns at higher sizes. With at least 50 clients, non-IID data distribution does not significantly impact performance. Reducing the number of clients negatively afects the F1-Score, especially with smaller hypervectors. Larger hypervectors (e.g., 50000 dimensions) are more robust, maintaining accuracy above 96% across all settings.

6. Conclusions

Malware detection is a critical task to ensure the security of mobile devices, however collecting and sharing data for training models can raise privacy concerns, thus decentralized on-device analysis and training are desirable. This work presents HDDroid, a novel approach to mobile malware detection that leverages the Hyperdimensional Computing paradigm to allow lightweight and eficient federated training and local classification of mobile applications on resource-constrained mobile devices. The on-device analysis preserves the privacy of the participating users and allows for eficient and robust malware detection. The experimental evaluation shows that HDDroid achieves state-of-the-art graphbased performance for malware detection, remaining efective in non-IID scenarios, making it suitable for real-world applications where data distributions among devices can be highly heterogeneous. Future works will focus on improving the flexibility and adaptability of the model in handling variations over time in the data distribution of the clients and on enabling collaboration between devices with heterogeneous hardware capabilities.

Acknowledgements

This work was partially funded by the European Union Next-Generation EU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3) – Progetto “Future Artificial Intelligence - FAIR” PE00000013, CUP J73C24000060007

Declaration on Generative AI

The authors have not employed any Generative AI tools. [7] L. Ge, K. K. Parhi, Classification using hyperdimensional computing: A review., IEEE Circuits and

Systems Magazine (2020). [8] S. Freitas, Y. Dong, J. Neil, D. Horng (Polo) Chau, A large-scale database for graph representation learning, in: NeurIPS Datasets and Benchmarks Track, 2021. URL: https://www.microsoft.com/ en-us/research/publication/a-large-scale-database-for-graph-representation-learning/. [9] Z. Liu, R. Wang, N. Japkowicz, D. Tang, W. Zhang, J. Zhao, Research on unsupervised feature learning for android malware detection based on restricted boltzmann machines, Future Generation Computer Systems 120 (2021) 91–108. [10] P. Faruki, V. Ganmoor, V. Laxmi, M. S. Gaur, A. Bharmal, Androsimilar: robust statistical feature signature for android malware detection, in: Proceedings of the 6th International Conference on Security of Information and Networks, 2013, pp. 152–159. [11] Y. Cui, Y. Sun, Z. Lin, DroidHook: a novel API-hook based android malware dynamic analysis sandbox, Automated Software Engineering 30 (2023) 10. [12] B. Anderson, C. Storlie, T. Lane, Improving malware classification: bridging the static/dynamic gap, in: Proc. of the 5th ACM workshop on Security and artificial intelligence, 2012. [13] A. Narayanan, G. Meng, L. Yang, J. Liu, L. Chen, Contextual Weisfeiler-Lehman graph kernel for malware detection, in: 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 4701–4708. [14] C. Li, Z. Cheng, H. Zhu, L. Wang, Q. Lv, Y. Wang, N. Li, D. Sun, DMalNet: Dynamic malware analysis based on API feature engineering and graph learning, Computers & Security 122 (2022) 102872. [15] T. S. John, T. Thomas, S. Emmanuel, Graph convolutional networks for android malware detection with system call graphs, in: 2020 Third ISEA Conference on Security and Privacy (ISEA-ISAP), IEEE, 2020, pp. 162–170. [16] N. V. Hung, P. N. Dung, T. N. Ngoc, V. D. Phai, Q. Shi, Malware detection based on directed multi-edge dataflow graph representation and convolutional neural network, in: 2019 11th Int.

Conf. on Knowledge and Systems Engineering, IEEE, 2019, pp. 1–5. [17] C. Cai, Y. Wang, A simple yet efective baseline for non-attributed graph classification, arXiv preprint arXiv:1811.03508 (2018). [18] T. Schulz, P. Welke, On the necessity of graph kernel baselines, in: ECML-PKDD, GEM workshop, volume 1, 2019, p. 6. [19] B. Rozemberczki, R. Sarkar, Characteristic functions on graphs: Birds of a feather, from statistical descriptors to parametric models, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1325–1334. [20] A. Tsitsulin, M. Munkhoeva, B. Perozzi, Just slaq when you approximate: Accurate spectral distances for web-scale graphs, in: Proc. of the Web Conf. 2020, 2020, pp. 2697–2703. [21] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Transactions on Neural Networks 20 (2008) 61–80. [22] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural networks?, in: International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id= ryGs6iA5Km. [23] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Representations, 2017. URL: https://openreview.net/forum? id=SJU4ayYgl. [24] I. Nunes, M. Heddes, T. Givargis, A. Nicolau, A. Veidenbaum, GraphHD: Eficient graph classification using hyperdimensional computing, in: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2022, pp. 1485–1490. [25] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-eficient learning of deep networks from decentralized data, in: Artificial intelligence and statistics, PMLR, 2017, pp. 1273–1282. [26] A. Augello, A. De Paola, G. Lo Re, M2FD: Mobile malware federated detection under concept drift,

Computers & Security (2025) 104361. doi:10.1016/j.cose.2025.104361. [27] P. Kanerva, Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors, Cognitive computation 1 (2009) 139–159. [28] P. Vergés, T. Givargis, A. Nicolau, Refinehd: Accurate and eficient single-pass adaptive learning using hyperdimensional computing, in: 2023 IEEE Int. Conf. on Rebooting Computing (ICRC), IEEE, 2023, pp. 1–8. [29] A. Hernández-Cano, N. Matsumoto, E. Ping, M. Imani, OnlineHD: Robust, eficient, and single-pass online learning using hyperdimensional system, in: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2021, pp. 56–61. [30] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Computer networks and ISDN systems 30 (1998) 107–117. [31] J. Morris, R. Fernando, Y. Hao, M. Imani, B. Aksanli, T. Rosing, Locality-based encoder and model quantization for eficient hyper-dimensional computing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41 (2021) 897–907. [32] J. K. Salmon, M. A. Moraes, R. O. Dror, D. E. Shaw, Parallel random numbers: as easy as 1, 2, 3, in: Proc. of 2011 int. conf. for high performance computing, networking, storage and analysis, 2011, pp. 1–12. [33] Q. Zhao, K. Lee, J. Liu, M. Huzaifa, X. Yu, T. Rosing, FedHD: Federated learning with hyperdimensional computing, in: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 2022, pp. 791–793. [34] Y. Park, D.-J. Han, D.-Y. Kim, J. Seo, J. Moon, Few-round learning for federated learning, Advances in Neural Information Processing Systems 34 (2021) 28612–28622. [35] M. Quebrado, E. Serra, A. Cuzzocrea, Android malware identification and polymorphic evolution via graph representation learning, in: 2021 IEEE International Conference on Big Data (Big Data), IEEE, 2021, pp. 5441–5449. [36] M. Heddes, I. Nunes, P. Vergés, D. Kleyko, D. Abraham, T. Givargis, A. Nicolau, A. Veidenbaum, Torchhd: An open source python library to support research on hyperdimensional computing and vector symbolic architectures, Journal of Machine Learning Research 24 (2023) 1–10. URL: http://jmlr.org/papers/v24/23-0300.html. [37] A. Augello, G. Falzone, G. Lo Re, DCFL: Dynamic clustered federated learning under diferential privacy settings, in: 2023 IEEE Int. Conf. on Pervasive Computing and Communications Workshops and other Afiliated Events, 2023, pp. 614–619. doi: 10.1109/PerComWorkshops56833.2023. 10150285.

[1]

Qamar ,

Karim ,

Chang , Mobile malware attacks: Review, taxonomy & future directions , Future Generation Computer Systems 97 ( 2019 ) 887 - 909 .

[2]

Pan ,

Ge ,

Fang ,

Fan , A systematic literature review of android malware detection using static analysis , IEEE Access 8 ( 2020 ) 116363 - 116379 .

[3]

Chua ,

Balachandran , Efectiveness of android obfuscation on evading anti-malware, in: Proceedings of the eighth ACM conference on data and application security and privacy, 2018 , pp. 143 - 145 .

[4]

Bilot ,

N. El

Madhoun ,

Al Agha ,

Zouaoui , A survey on malware detection with graph representation learning , ACM Computing Surveys 56 ( 2024 ) 1 - 36 .

[5]

Seneviratne ,

Mohapatra ,

Mahanti , Predicting user traits from a snapshot of apps installed on a smartphone , ACM SIGMOBILE Mobile Computing and Communications Review 18 ( 2014 ) 1 - 8 .

[6]

Augello , A. De Paola , G. Lo Re, Hybrid multilevel detection of mobile devices malware under concept drift , Journal of Network and Systems Management 33 ( 2025 ) 36 . doi: 10 .1007/ s10922-025-09906-3.