<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>February</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>HDDroid: Federated Hyperdimensional Computing for Mobile Malware Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Augello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra De Paola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Lo Re</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Menager</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Engineering, University of Palermo</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polytech Dijon</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <fpage>3</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>Mobile malware poses significant security and privacy risks, hence efective detection methods are crucial. Graphbased representations of mobile applications have been shown to be well-suited for this task. However, traditional graph-based machine learning techniques are computationally expensive and unsuitable for on-device analysis. Nevertheless, of-device analysis raises privacy concerns, making on-device analysis combined with decentralized learning approaches like Federated Learning (FL) an attractive alternative. Hyperdimensional Computing (HDC) ofers eficient graph classification on resource-constrained mobile devices. This work introduces HDDroid, an FL framework leveraging HDC to detect malicious software via function call graph analysis. HDDroid's novel online encoding strategy reduces memory usage, enabling large graph analysis on mobile devices. Additionally, HDDroid's improved model aggregation strategy enhances model robustness and classification accuracy, achieving state-of-the-art performance in distributed learning scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Federated Learning</kwd>
        <kwd>Malware detection</kwd>
        <kwd>Hyperdimensional Computing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The widespread difusion of mobile devices, and the consequent increase in the number of applications
available on those devices, has led mobile malware to become a significant threat to the security and
privacy of users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], thus making efective detection methods crucial to limit these risks. Raw features
like bytecodes, opcodes, strings, permissions, and APIs are easy to extract and can be used to identify
malware [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, these low-level features are vulnerable to code obfuscation, considered that
malware developers alter code features while maintaining functionality [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Higher-level structured
features, such as multimodal features, API call reachability analysis, and function call graphs, are
preferred as they are harder to modify while preserving the program’s semantics [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Graph-based
representations in particular efectively capture the semantics of the code and can be extracted by
lightweight static analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        An additional challenge in mobile malware detection is the privacy of the users: a simple snapshot of
the installed applications on a device is suficient to infer sensitive information such as the users’ location,
interests, religious beliefs, marital status, medical conditions, and habits [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Hence, to preserve the
privacy of the users, data should never leave their devices. Although graph-based representations can be
obtained easily without requiring of-device dynamic analysis, traditional machine learning techniques
for graph classification are computationally expensive and require multiple passes through the data.
Thus, mobile devices might struggle to perform the necessary computations for on-device analysis, and
real-time detection would not be feasible [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Hyperdimensional Computing (HDC) is a brain-inspired
computing paradigm that is recently gaining popularity in the machine learning community [7]. This
paradigm is well suited for devices with limited energy and computational resources such as mobile
phones and represents a promising alternative for on-device training and inference [7].
      </p>
      <p>This work introduces HDDroid, a federated learning framework for mobile malware detection to
eficiently and efectively detect malicious applications through the analysis of function call graphs.
The main contributions of this work can be summarized as follows:
• HDDroid enables on-device training and analysis leveraging Hyperdimensional Computing;
• through online encoding, HDDroid reduces the memory requirements for function call graphs
encoding, enabling the analysis of large graphs on resource-constrained devices;
• to integrate the contributions of multiple users, HDDroid employs a novel aggregation strategy
for distributed learning that filters out irrelevant information and increases the classification
accuracy.</p>
      <p>Experimental results on the comprehensive real-world publicly available MalNet [8] dataset show
that HDDroid achieves state-of-the-art performance in mobile malware detection, and maintains a
high accuracy in distributed learning scenarios. The remainder of this paper is organized as follows.
Section 2 reviews related works. Section 3 provides preliminary information on the adopted computing
paradigms. Section 4 details the HDDroid architecture. Section 5 presents the experimental evaluation.
Finally, Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>The field of malware detection has been the subject of extensive research. The adoption of Machine
Learning methods promises to be the best approach to enable the design of solutions that can
automatically learn how to detect these types of threats [9]. Early automated malware detection systems detected
malware by computing a signature summarizing the characteristics of the software and comparing it
to known malware signatures. The signature of a software is constituted by a unique string of bytes
generated through an analysis of the code. Several approaches to generate efective software signatures
for malware detection have been proposed [10]. For instance, early approaches used a digest of the
ifle to check if an executable is a known malware. More fine-grained approaches statically analyze
the execution flow of the program trying to match sequences of operation in the code with known
malware patterns. Signature-based approaches are fast and efective in detecting known malware but
they are not reliable for detecting unknown malware types and can be eluded by polymorphic malware.
Additionally, signatures need to be crafted through careful feature engineering, limiting versatility. A
more flexible approach compared to signature-based detection is behavior-based classification. These
techniques analyze the behavior of the software during its execution by monitoring system calls, file
changes or network activity and build a feature vector that is then used to classify the software. This
approach requires sandbox execution, which is computationally expensive for on-device analysis [11].
Thus, static analysis is more suitable for on-device feature extraction.</p>
      <p>
        However, detection through low-level static features such as APIs, strings, permissions, and opcodes
can be eluded through code obfuscation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] since malware developers can alter the code to change its
features while maintaining its functionality. Therefore, higher-level structured features are preferred.
For instance, graph representations of software are robust to obfuscation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] while being relatively easy
to acquire from the code with a static analysis without requiring computationally expensive dynamic
analysis [12]. Among the most used graph representations for malware analysis are: a) Control Flow
Graphs [13] that model paths between blocks of code during the program execution; b) Function call and
API dependencies graphs [14], which are Control Flow Graphs with the functions/API calls granularity;
c) System call graphs [15] that model interactions between the program and the underlying operating
system; d) System entity graphs [16] that model interactions between the program and the system
entities such as files, network connections, and processes. Function call graphs extracted through a
static analysis of the API calls are the most popular choice for Android malware detection [14].
      </p>
      <p>Traditional approaches to represent graphs for learning tasks often rely on hand-crafted summary
statistics to encode the graph structure into feature vectors. For example, LDP [17] uses histograms of
node degrees and their 1-hop neighbors, while NoG [18] focuses on node and edge attributes, ignoring
topology. Other methods use random walks to describe node neighbors [19]. Graph kernels, such as
Slaq-LSD and Slaq-VNGE [20], measure approximate distances between spectral graph representations.
Recent works have focused on learning graph representations directly from raw graph data using
deep learning models. Graph Neural Networks (GNNs) propagate information between neighboring
nodes [21], learn the optimal aggregation function [22], and extending convolution operations to
graph data [23]. While efective, these methods are computationally expensive and unsuitable for
on-device analysis in constrained devices like mobile phones. A recent alternative for graph embedding
is Hyperdimensional Computing, specifically the GraphHD algorithm [ 24]. Rather than learning a
representation of the nodes, GraphHD assigns random high-dimensional vectors to nodes and combines
them to represent edges and the entire graph. This approach is suitable for on-device analysis, as it is
more computationally eficient with shorter training times compared to traditional GNNs.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <sec id="sec-3-1">
        <title>3.1. Federated Learning</title>
        <p>Federated Learning (FL) is a decentralized machine learning approach that lets multiple independent
data owners to contribute to a shared model without disclosing their data [25]. In FL, each participating
client (data owner) has a local, private, dataset. A central server distributes a copy of a classification
model to all clients. Each client trains the model on its local data for a set number of iterations. The
clients then send the updated models to the server, which aggregates them into a global model through
a weighted average. The new model is then redistributed to the clients for further refining, repeating
this process for multiple rounds until model convergence. FL preserves the privacy of the clients’ data,
sharing only model parameters and keeping the data private, making it an attractive alternative to
centralized learning for malware detection tasks [26].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Hyperdimensional Computing</title>
        <p>Hyperdimensional computing (HDC), described in-depth in [27], is a brain-inspired computing paradigm
that leverages high-dimensional vectors to perform computations. HDC performs classification tasks
by representing data as hypervectors (HVs), whose dimensionality is typically in the order of tens of
thousands, and associating these with class labels. An hypervector ℋ is defined as ⟨ℎ1, ℎ2, . . . , ℎ⟩
where ℎ is the value of the -th dimension of the vector. The size of the vectors  is consistent
throughout a HDC system: operations act on HVs of homogeneous length and do not modify their
dimension. HDC is scalable, parallelizable, energy-eficient, and well-suited for few-shot learning tasks,
requiring less training and inference time than neural networks of comparable accuracy [7].
Operations A useful property of random vectors in high-dimensional spaces is their
quasiorthogonality [27] (their cosine similarity is close to 0), which allows multiple HVs to be combined
through simple mathematical operations while maintaining the information encoded in each HV. Three
major operations are supported by HVs and used in HDC:
• Permutation (Π) is an unary operation which rearranges the elements of the vector into a HV
quasiorthogonal to the original one. This operation is often used together with the binding operations to
assign an order between HVs, and it typically implemented through a circular shift, moving all the
coordinates of the HV clockwise:</p>
        <p>Π(ℋ) = ⟨ℎ, ℎ1, . . . , ℎ− 1⟩.
• Binding (⊛) two HVs yields an HV dissimilar to both. Binding associates information from two
diferent HVs (e.g., assigning values to variables) through element-wise multiplication:
ℋ1 ⊛ ℋ2 = ⟨ℎ1,1 × ℎ2,1, . . . , ℎ1, × ℎ2,⟩.
• Bundling (⊕ ) combines two HVs into a new hypervector through element-wise addition, obtaining
a HV similar to both the input HVs:</p>
        <p>ℋ1 ⊕ ℋ 2 = ⟨ℎ1,1 + ℎ2,1, . . . , ℎ1, + ℎ2,⟩.</p>
        <p>The high dimensionality allows these operations produce distinct and non-conflicting HVs. The
operators act on HV elements independently and do not require loading the entire HV into memory,
making them highly parallelizable and suitable for analysis on both resource-constrained devices such
as low-end mobile phones and devices capable of parallel processing.</p>
        <p>Training and classification In HDC, a classification task with  classes is performed by computing a
class prototype HV for each class 1, . . . , , storing them into an associative memory ℳ, and then
comparing query HVs with the class prototypes to determine the predicted class.</p>
        <p>Class HVs can be computed by bundling the HVs in the training set belonging to the corresponding
class. Single-pass training is simple and eficient, but often leads to low accuracy, thus iterative training
algorithms such as RefineHD [ 28] and OnlineHD [29] have been proposed to improve the classification
performance. These algorithms bundle incorrectly classified query HVs to the correct class HV and
subtract them from the wrongly predicted class HV.</p>
        <p>At inference time, the similarities of a query hypervector ℋ and the class hypervectors in the
associative memory ℳ are computed. The class whose HV has the highest cosine similarity with the
query is considered to be the predicted label:
(ℳ, ℋ) = arg max  (, ℋ) = arg max</p>
        <p>·ℋ 
||||×||ℋ ||
.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. HDDroid architecture</title>
      <p>HDDroid leverages a collaborative learning framework to allow multiple independent data owners
to contribute to a shared mobile malware detection model without disclosing their data. The system
includes  clients, each with its own private dataset, and a central server that coordinates training.
Each client uses its local data to train a hyperdimensional associative memory ℳ() for classification
and sends it to the central server. The server aggregates the classifiers into a global classifier ℳ() and
sends it back to the clients for further refinement over several iterations. A schematic representation of
the HDDroid architecture is shown in Figure 1.</p>
      <sec id="sec-4-1">
        <title>Local executable dataset</title>
      </sec>
      <sec id="sec-4-2">
        <title>Local executable dataset</title>
      </sec>
      <sec id="sec-4-3">
        <title>Local executable dataset</title>
        <p>Client 1
M(1)
s
i
s
y
l
a
n
A
c
i
t
a
t
S
s
i
s
y
l
itcaanA ilssayn</p>
      </sec>
      <sec id="sec-4-4">
        <title>FuntSction cAall graphs</title>
        <p>c
i
t
a
FuntSction call graphs
. .ign. ⊕ M(...)
ed Cliehn0th1nh2 h3 h4 h5 ... hD
d
raphbEGm ireaghpbddnEGmhh0h00hh1Hh11hyh2iregaddbnphEGmh0p2h02hhe0h30rh13hv13Hhh1eh41h2cy4h24thh0p2hho052heh305rh305r.hs13hhv..13Hhh.4.1.eh4.1.hh2cy4.hh2D4thph52Dho52Deh35r.h35r..hs3hv...3h.4..ehh4..hhcD4.hD4th5Dho5Dh5r.5..s......h..hD.hDhDD
⊕ M(n)</p>
      </sec>
      <sec id="sec-4-5">
        <title>Function call graphs</title>
        <p>M(G)
Σ</p>
        <p>Server
Aggregation weights</p>
        <sec id="sec-4-5-1">
          <title>4.1. Mobile application encoding</title>
          <p>HDDroid analyzes mobile applications using function call graphs obtained through static code analysis.
Each function is a graph node with a unique identifier, and directed edges represent function calls,
indicating possible execution paths. Hyperdimensional Computing directly encodes function call
graphs of mobile applications into HVs for classification, without the need for feature engineering. The
encoding process is hierarchical: vertices are encoded first, then they are combined into edges HVs, and
edges are bundled to encode the entire graph.</p>
          <p>0
1
2</p>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>Graph A</title>
        <p>3
4</p>
      </sec>
      <sec id="sec-4-7">
        <title>Node encodings B</title>
        <p>Node encoding The first step in the encoding process is to assign a unique hypervector to each
graph node. Since the function identifiers in an application are arbitrary, they cannot be encoded
directly into HVs. To ensure consistent encoding across graphs, each node is labeled with its PageRank
centrality [30], as proposed in [24]. PageRank measures a node’s importance based on the stationary
distribution of a random walk on the graph. Once the nodes receive a meaningful identifier consistent
across diferent graphs, each identifier is mapped to an HV.</p>
        <p>A common approach to encode discrete values into HVs is to precompute a set of random HVs, one
for each possible value. However, this requires an HV for each value, causing the random basis size to
grow linearly with the number of values, which can be impractical. For application call graphs, the
number of possible function calls is unbounded, with some graphs containing almost 600 thousand
nodes [8]. Encoding all nodes in HVs with dimensionality 10000 would require over 40 GB of memory,
making it infeasible for resource-constrained mobile devices. Additionally, a fixed random base requires
knowing the range of values to be encoded, which is impractical for function call graphs due to the
arbitrary number of function calls in mobile applications. Thus, using a random basis is not viable in
this target scenario.</p>
        <p>Other approaches, such as Flocet encoding [28] and random projections [31], embed discrete values
into HVs without precomputing a random basis. However, these methods preserve similarity between
original values, which is undesirable here as consecutively ranked nodes correspond to diferent
functions. To address these issues, HDDroid uses an online encoding strategy based on the Philox
algorithm [32], a counter-based pseudo-random number generator. A digest is computed from the node
label to seed the key and counter for Philox, which generates an arbitrary-length sequence of
pseudorandom values as the HV encoding. This deterministic procedure ensures the same label is always
encoded as the same hypervector, without storing a lookup table, and guarantees quasi-orthogonal HVs.
Additionally, Philox can generate the values in parallel, making it suitable for encoding large graphs on
multi-core devices.</p>
        <p>Hypervector-based graph representation Given the HVs for all the nodes, they can be combined
to compute the HV for the whole graph [24] as illustrated in Figure 2. Each edge (, ) in the graph
is encoded by binding (element-wise multiplication) the hypervectors ℋ and ℋ of the two nodes 
and , with the destination node HV permuted to indicate the direction of the edge. All the edges are
then bundled (element-wise addition) into the graph hypervector ℋgraph as per Eq. 1. Each element
in the graph HV is thus given by the sum, for each pair of connected nodes, of the products of the
corresponding elements in the node HVs, with the elements in the arrival node HV rotated by one
position:
ℋgraph = ⨁︁ ℋ ⊛ Π(ℋ) = ⟨ ∑︁ ℎ,1 × ℎ,, ∑︁ ℎ,2 × ℎ,1, . . . , ∑︁ ℎ, × ℎ,− 1⟩.
(,)∈edges (,)∈edges (,)∈edges (,)∈edges
(1)
Since all node HVs have the same dimensionality, the graph HV will have a fixed size regardless of
the number of function calls. However, graphs with more edges will sum more quasi-orthogonal
edge HVs, potentially increasing the vector magnitude. To ensure fair influence on the final classifier,
graph hypervectors are normalized to have the same magnitude, regardless of the application size. The
complete encoding algorithm is reported in Algorithm 1. At any given step of the algorithm, the server
only needs to record at most three hypervectors: ℋℎ, ℋ and either ℋ or ℋ,. Excluding the
PageRank computation, the memory requirement of the graph encoding algorithm is (), independent
of the graph size, while the time complexity is (| | · ). The graph information is spread across the
HV, no HV element stores specific information about a node or an edge, granting robustness to noise
and errors.</p>
        <p>Algorithm 1 Application Function Call Graph Encoding Algorithm
Require: Application, Hypervector size 
Ensure: Graph hypervector ℋgraph
1: Graph  = (, ) = call_graph(Application)
2: Initialize ℋgraph as an empty hypervector
3: r ← { pagerank(, ) |  ∈  }
4: for  ∈  do
5:  ← hash(r)
6: ,  ← lower_bits(), upper_bits()
7: ℋ ← Philox(, , )
8: for  ∈  | (, ) ∈  do
9:  ← hash(r)
10: ,  ← lower_bits(), upper_bits()
11: ℋ ← Philox(, , )
12: ℋ, ← ℋ  ⊛ Π(ℋ)
13: ℋgraph ← ℋ graph ⊕ ℋ ,
14: return |ℋℋggrraapphh|
{Compute unique ID for each node}</p>
        <p>{Generate start node HV }
{Generate destination node HV }
{Update graph HV with new edge}</p>
        <sec id="sec-4-7-1">
          <title>4.2. Few-shot collaborative learning</title>
          <p>After encoding the function call graphs into HVs, each client computes a local associative memory
ℳ() for classification. Through federated learning, clients share their local classifiers with the central
server, which aggregates them into a global classifier ℳ(). This process is repeated for several rounds,
refining the global classifier. The HDC paradigm enables eficient distributed learning with minimal
training iterations and communication cycles.</p>
          <p>Local training As discussed in Section 3, the local training of the classifier is often performed by
computing the class hypervectors through one-pass training. However, bundling all HVs of the same
class can lead to model saturation, as common patterns overshadow specific characteristics crucial for
classification. This is particularly problematic in mobile application classification, where many apps
share functionalities, and malware can mimic benign apps to elude detection. Instead of bundling all
HVs of the same class, HDDroid adopts the weighted aggregation strategy in [28]. Each sample ℋ is
compared to the local associative memory. If the class hypervector most similar to ℋ is not the one for
its class , a weighted portion of ℋ is added to  and subtracted from the most similar class
hypervector  to increase and decrease their similarity to ℋ, respectively. The weights depend on
the similarity of the HV with the predicted and true label classes,   and  , as per Eq. (2):
 =  + (1 −  )(1 −   −  )ℋ
 =  − (1 −  )(1 −   −  )ℋ.
(2)
When the sample has a high similarity with the correct class, only a small portion of the hypervector
is added to the class hypervector. When the diference is significant, a larger portion of the sample
hypervector is added to the correct class hypervector and subtracted from the incorrect one to better
incorporate the sample’s information. HVs with correct, but low confidence, predictions are still
added to the correct class hypervector. In this case, the weight of the hypervector is computed as
 =  + (1 −   −  )ℋ. A prediction is considered to have low confidence if the similarity
with the correct class hypervector is below the average similarity of the wrongly classified samples
with the correct class hypervector.</p>
          <p>Model aggregation Once clients update their associative memories, they send the class HVs to the
central server for aggregation into a global classifier [ 25]. This process leverages information from
all clients, creating a more accurate model while reducing communication overhead and preserving
client privacy compared to transmitting the raw HVs. Typically, HDC aggregates local classifiers by
averaging class HVs from each client, similar to traditional federated learning. A representative example
is Fed-HD [33], which computes the global classifier as ℳ() = ℳ()+∑+︀1=1 ℳ() .</p>
          <p>However, this aggregation process can lead to model saturation by repeatedly adding similar
information from all clients. HDDroid weights class HVs from each client by the distinctiveness of their
information, avoiding redundancy and improving learning outcomes. Given the class  hypervector
from client , (), the server evaluates the intra-class redundancy (, ), which measures the average
() with the hypervectors of other clients for the same class:
similarity of 
(, ) = 1 ∑︀=1  ((), ()).</p>
          <p>() is similar to other clients’ hypervectors for the same class,
High intra-class redundancy means 
adding little new information. The server also evaluates the inter-class confusion  (, ) of client  for
class  by comparing () to hypervectors from other classes:</p>
          <p>(, ) = 1 ∑︀=1 max  ((), ℳ() ∖ ()),
where ℳ() ∖ () is the set of class HVs in the associative memory of client  excluding (). High
() contains information not specific to class .
inter-class confusion indicates that</p>
          <p>The computation of  and  for all class hypervectors has a complexity of (2 · 2 · ), where  is
the number of clients,  is the number of classes, and  is the hypervector dimensionality. Since  and
 are hyperparameters, the aggregation process scales quadratically with the number of clients. These
computations are independent and can be parallelized for eficiency.</p>
          <p>The normalized aggregation weight  () is computed using the intra-class
() for the hypervector 
redundancy  and inter-class confusion  as per Eq. (3):
() = ∑︀=1 ∑︀=1(1− (,))(1− (,)− (,))
(1− (,))(1− (,)− (,))
.</p>
          <p>(3)
A hypervector similar to others in the same class or to those of other classes is considered redundant or
uninformative, respectively, and its weight is decreased. The class HVs’ weights are used by the server
to compute the updated global memory as:</p>
          <p>ℳ() = ︁( ∑︀=1 1()1(), . . . , ∑︀=1 ()())︁ .</p>
          <p>Unlike traditional FL on neural networks, which requires many communication rounds to achieve good
performance [34], HDDroid is experimentally shown to converge in few rounds.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental evaluation</title>
      <p>To evaluate HDDroid, experiments were conducted on the MalNet tiny 5K [8] and MalNet 61K [35]
datasets, containing function call graphs of Android applications labeled as benign or malware. The
MalNet tiny 5K dataset contains 5000 applications, each with up to 5000 nodes per graph. The larger 61K
dataset includes 87430 applications, with graphs containing up to 599234 nodes. The 61K dataset’s size
and complexity make it a better testbed for evaluating HDDroid’s scalability and encoding efectiveness.
Most experiments were conducted on the 61K dataset, with the 5K dataset used for comparison with
existing methods. In order to simulate a large number of devices, the experiments were conducted on a
server with 32 Intel(R) Xeon(R) Gold 6242 CPU cores, 32 GB of RAM, and an NVIDIA A100 GPU. The
HDDroid model was implemented using the TorchHD library [36].</p>
      <sec id="sec-5-1">
        <title>Comparison with state-of-the-art approaches: First HDDroid is compared with other non-HDC</title>
        <p>existing approaches for graph classification in a centralized setting. Table 1 and Table 2 show the
performance of HDDroid alongside state-of-the-art methods reported in the literature on the MalNet
tiny 5K and MalNet 61K datasets, respectively.</p>
        <p>Method SIR-GN [35] Feather [19] LDP [17] GIN [22] GCN [23] Slaq-LSD [20] NoG [18] Slaq-VNGE [20] HDDroid
Accuracy 0.92 0.86 0.86 0.90 0.81 0.76 0.77 0.53 0.90</p>
        <p>On the MalNet tiny 5K dataset, HDDroid achieves an accuracy of 0.90, closely matching the best
method, SIR-GN, which has an accuracy of 0.92, demonstrating the viability of the HDC paradigm for
malware detection. On the larger 61K dataset, HDDroid outperforms the SIR-GN algorithm,
demonstrating its scalability and the efectiveness of the HDC paradigm in capturing relevant information in
large function call graphs.</p>
        <p>Other HDC methods: HDDroid’s training and aggregation strategies are assessed through a
comparison with traditional HDC methods. The first compared approach utilizes OnlineHD [ 28] for local
training, maintaining HDDroid’s aggregation strategy. For the second, Fed-HD [33] is used for
aggregation, keeping HDDroid’s local training strategy. Figure 3 shows the accuracy of HDDroid on
HDDroid
OnlineHD</p>
        <p>Fed-HD
8
0.96
0.94
0.92
0.90
0.88
e0.96
ro0.94
c
–S0.92
1F0.90
0.88
0.96
0.94
0.92
0.90
0.88
D
I
I
1
.
0
=
α
0.2 0.4 0.6 0.8
False Positive Rate
1
the MalNet 61K dataset across training rounds compared with OnlineHD and Fed-HD. OnlineHD
starts with high initial accuracy but drops over time, likely due to local training saturation. Fed-HD
shows initial improvement but quickly plateaus, possibly due to redundant information in the global
classifier. HDDroid, on the other hand, starts with a lower initial accuracy, as the inclusion of new HVs
is conservative, but quickly surpasses the other methods, capturing more of the relevant information in
the HVs.</p>
        <p>To better assess how well the obtained class hypervectors capture the information relative to their
classes, the Receiver Operating Characteristic (ROC) curve for HDDroid, OnlineHD, and Fed-HD
are compared, plotting the true positive rate against the false positive rate varying the classification
threshold. In this analysis, a model that does not rely on the PageRank node ordering is also considered to
gauge the impact of this procedure on the quality of the encoding. Figure 4 shows the ROC curve for the
tested methods. Removing PageRank ordering significantly reduces model performance, highlighting
its importance in encoding. As in the previous analysis, HDDroid outperforms the other approaches,
confirming the efectiveness of the training and aggregation strategies.</p>
        <p>Robustness to non-IID data distributions: In federated learning settings, clients often have
heterogeneous data distributions, which can lead to a decrease in model performance [37]. To evaluate
HDDroid’s robustness to non-IID data distributions, the dataset labels are distributed among clients
using a Dirichlet distribution. An IID setting and two non-IID settings are tested: a moderate non-IID
setting ( = 1), and a highly non-IID setting ( = 0.1), where malware classes are almost partitioned
between clients. As shown in Figure 5, larger hypervector sizes generally improve performance, with
diminishing returns at higher sizes. With at least 50 clients, non-IID data distribution does not
significantly impact performance. Reducing the number of clients negatively afects the F1-Score, especially
with smaller hypervectors. Larger hypervectors (e.g., 50000 dimensions) are more robust, maintaining
accuracy above 96% across all settings.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Malware detection is a critical task to ensure the security of mobile devices, however collecting and
sharing data for training models can raise privacy concerns, thus decentralized on-device analysis and
training are desirable. This work presents HDDroid, a novel approach to mobile malware detection
that leverages the Hyperdimensional Computing paradigm to allow lightweight and eficient federated
training and local classification of mobile applications on resource-constrained mobile devices. The
on-device analysis preserves the privacy of the participating users and allows for eficient and robust
malware detection. The experimental evaluation shows that HDDroid achieves state-of-the-art
graphbased performance for malware detection, remaining efective in non-IID scenarios, making it suitable
for real-world applications where data distributions among devices can be highly heterogeneous. Future
works will focus on improving the flexibility and adaptability of the model in handling variations
over time in the data distribution of the clients and on enabling collaboration between devices with
heterogeneous hardware capabilities.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was partially funded by the European Union Next-Generation EU (PIANO NAZIONALE DI
RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3) – Progetto
“Future Artificial Intelligence - FAIR” PE00000013, CUP J73C24000060007</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[7] L. Ge, K. K. Parhi, Classification using hyperdimensional computing: A review., IEEE Circuits and</p>
      <p>Systems Magazine (2020).
[8] S. Freitas, Y. Dong, J. Neil, D. Horng (Polo) Chau, A large-scale database for graph representation
learning, in: NeurIPS Datasets and Benchmarks Track, 2021. URL: https://www.microsoft.com/
en-us/research/publication/a-large-scale-database-for-graph-representation-learning/.
[9] Z. Liu, R. Wang, N. Japkowicz, D. Tang, W. Zhang, J. Zhao, Research on unsupervised feature
learning for android malware detection based on restricted boltzmann machines, Future Generation
Computer Systems 120 (2021) 91–108.
[10] P. Faruki, V. Ganmoor, V. Laxmi, M. S. Gaur, A. Bharmal, Androsimilar: robust statistical feature
signature for android malware detection, in: Proceedings of the 6th International Conference on
Security of Information and Networks, 2013, pp. 152–159.
[11] Y. Cui, Y. Sun, Z. Lin, DroidHook: a novel API-hook based android malware dynamic analysis
sandbox, Automated Software Engineering 30 (2023) 10.
[12] B. Anderson, C. Storlie, T. Lane, Improving malware classification: bridging the static/dynamic
gap, in: Proc. of the 5th ACM workshop on Security and artificial intelligence, 2012.
[13] A. Narayanan, G. Meng, L. Yang, J. Liu, L. Chen, Contextual Weisfeiler-Lehman graph kernel for
malware detection, in: 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp.
4701–4708.
[14] C. Li, Z. Cheng, H. Zhu, L. Wang, Q. Lv, Y. Wang, N. Li, D. Sun, DMalNet: Dynamic malware
analysis based on API feature engineering and graph learning, Computers &amp; Security 122 (2022)
102872.
[15] T. S. John, T. Thomas, S. Emmanuel, Graph convolutional networks for android malware detection
with system call graphs, in: 2020 Third ISEA Conference on Security and Privacy (ISEA-ISAP),
IEEE, 2020, pp. 162–170.
[16] N. V. Hung, P. N. Dung, T. N. Ngoc, V. D. Phai, Q. Shi, Malware detection based on directed
multi-edge dataflow graph representation and convolutional neural network, in: 2019 11th Int.</p>
      <p>Conf. on Knowledge and Systems Engineering, IEEE, 2019, pp. 1–5.
[17] C. Cai, Y. Wang, A simple yet efective baseline for non-attributed graph classification, arXiv
preprint arXiv:1811.03508 (2018).
[18] T. Schulz, P. Welke, On the necessity of graph kernel baselines, in: ECML-PKDD, GEM workshop,
volume 1, 2019, p. 6.
[19] B. Rozemberczki, R. Sarkar, Characteristic functions on graphs: Birds of a feather, from statistical
descriptors to parametric models, in: Proceedings of the 29th ACM International Conference on
Information &amp; Knowledge Management, 2020, pp. 1325–1334.
[20] A. Tsitsulin, M. Munkhoeva, B. Perozzi, Just slaq when you approximate: Accurate spectral
distances for web-scale graphs, in: Proc. of the Web Conf. 2020, 2020, pp. 2697–2703.
[21] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network
model, IEEE Transactions on Neural Networks 20 (2008) 61–80.
[22] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural networks?, in:
International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=
ryGs6iA5Km.
[23] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in:
International Conference on Learning Representations, 2017. URL: https://openreview.net/forum?
id=SJU4ayYgl.
[24] I. Nunes, M. Heddes, T. Givargis, A. Nicolau, A. Veidenbaum, GraphHD: Eficient graph
classification using hyperdimensional computing, in: 2022 Design, Automation &amp; Test in Europe
Conference &amp; Exhibition (DATE), IEEE, 2022, pp. 1485–1490.
[25] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-eficient learning
of deep networks from decentralized data, in: Artificial intelligence and statistics, PMLR, 2017, pp.
1273–1282.
[26] A. Augello, A. De Paola, G. Lo Re, M2FD: Mobile malware federated detection under concept drift,</p>
      <p>Computers &amp; Security (2025) 104361. doi:10.1016/j.cose.2025.104361.
[27] P. Kanerva, Hyperdimensional computing: An introduction to computing in distributed
representation with high-dimensional random vectors, Cognitive computation 1 (2009) 139–159.
[28] P. Vergés, T. Givargis, A. Nicolau, Refinehd: Accurate and eficient single-pass adaptive learning
using hyperdimensional computing, in: 2023 IEEE Int. Conf. on Rebooting Computing (ICRC),
IEEE, 2023, pp. 1–8.
[29] A. Hernández-Cano, N. Matsumoto, E. Ping, M. Imani, OnlineHD: Robust, eficient, and single-pass
online learning using hyperdimensional system, in: 2021 Design, Automation &amp; Test in Europe
Conference &amp; Exhibition (DATE), IEEE, 2021, pp. 56–61.
[30] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Computer networks
and ISDN systems 30 (1998) 107–117.
[31] J. Morris, R. Fernando, Y. Hao, M. Imani, B. Aksanli, T. Rosing, Locality-based encoder and model
quantization for eficient hyper-dimensional computing, IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems 41 (2021) 897–907.
[32] J. K. Salmon, M. A. Moraes, R. O. Dror, D. E. Shaw, Parallel random numbers: as easy as 1, 2, 3, in:
Proc. of 2011 int. conf. for high performance computing, networking, storage and analysis, 2011,
pp. 1–12.
[33] Q. Zhao, K. Lee, J. Liu, M. Huzaifa, X. Yu, T. Rosing, FedHD: Federated learning with
hyperdimensional computing, in: Proceedings of the 28th Annual International Conference on Mobile
Computing And Networking, 2022, pp. 791–793.
[34] Y. Park, D.-J. Han, D.-Y. Kim, J. Seo, J. Moon, Few-round learning for federated learning, Advances
in Neural Information Processing Systems 34 (2021) 28612–28622.
[35] M. Quebrado, E. Serra, A. Cuzzocrea, Android malware identification and polymorphic evolution
via graph representation learning, in: 2021 IEEE International Conference on Big Data (Big Data),
IEEE, 2021, pp. 5441–5449.
[36] M. Heddes, I. Nunes, P. Vergés, D. Kleyko, D. Abraham, T. Givargis, A. Nicolau, A. Veidenbaum,
Torchhd: An open source python library to support research on hyperdimensional computing
and vector symbolic architectures, Journal of Machine Learning Research 24 (2023) 1–10. URL:
http://jmlr.org/papers/v24/23-0300.html.
[37] A. Augello, G. Falzone, G. Lo Re, DCFL: Dynamic clustered federated learning under diferential
privacy settings, in: 2023 IEEE Int. Conf. on Pervasive Computing and Communications Workshops
and other Afiliated Events, 2023, pp. 614–619. doi: 10.1109/PerComWorkshops56833.2023.
10150285.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Qamar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Mobile malware attacks: Review, taxonomy &amp; future directions</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>97</volume>
          (
          <year>2019</year>
          )
          <fpage>887</fpage>
          -
          <lpage>909</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>A systematic literature review of android malware detection using static analysis</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>116363</fpage>
          -
          <lpage>116379</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Balachandran</surname>
          </string-name>
          ,
          <article-title>Efectiveness of android obfuscation on evading anti-malware, in: Proceedings of the eighth ACM conference on data and application security and privacy,</article-title>
          <year>2018</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bilot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. El</given-names>
            <surname>Madhoun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Al Agha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zouaoui</surname>
          </string-name>
          ,
          <article-title>A survey on malware detection with graph representation learning</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahanti</surname>
          </string-name>
          ,
          <article-title>Predicting user traits from a snapshot of apps installed on a smartphone</article-title>
          ,
          <source>ACM SIGMOBILE Mobile Computing and Communications Review</source>
          <volume>18</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Augello</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Paola</surname>
          </string-name>
          , G. Lo Re,
          <article-title>Hybrid multilevel detection of mobile devices malware under concept drift</article-title>
          ,
          <source>Journal of Network and Systems Management</source>
          <volume>33</volume>
          (
          <year>2025</year>
          )
          <article-title>36</article-title>
          . doi:
          <volume>10</volume>
          .1007/ s10922-025-09906-3.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>