1. Introduction

Robust Machine Learning for Malware Detection over Time

Daniele Angioni

daniele.angioni@unica.it 1

Luca Demetrio

luca.demetrio93@unica.it 0 1

Maura Pintor

maura.pintor@unica.it 0 1

Battista Biggio

battista.biggio@unica.it 0 1 0 Pluribus One S.r.l. , Cagliari , Italy 1 University of Cagliari , Cagliari , Italy

The presence and persistence of Android malware is an on-going threat that plagues this information era, and machine learning technologies are now extensively used to deploy more efective detectors that can block the majority of these malicious programs. However, these algorithms have not been developed to pursue the natural evolution of malware, and their performances significantly degrade over time because of such concept-drift. Currently, state-of-the-art techniques only focus on detecting the presence of such drift, or they address it by relying on frequent updates of models. Hence, there is a lack of knowledge regarding the cause of the concept drift, and ad-hoc solutions that can counter the passing of time are still underinvestigated. In this work, we commence to address these issues as we propose (i) a drift-analysis framework to identify which characteristics of data are causing the drift, and (ii) SVM-CB, a time-aware classifier that leverages the drift-analysis information to slow down the performance drop. We highlight the eficacy of our contribution by comparing its degradation over time with a state-of-the-art classifier, and we show that SVM-CB better withstand the distribution changes that naturally characterizes the malware domain. We conclude by discussing the limitations of our approach and how our contribution can be taken as a ifrst step towards more time-resistant classifiers that not only tackle, but also understand the concept drift that afect data.

eol>android malware machine learning concept drift

1. Introduction

In this information era, we are experiencing tremendous growth in mobile technology, both in its eficacy and pervasiveness. One of the most common operating systems for mobile devices is Android, 1 and, because of its popularity, it became particularly attractive to cyberattackers eyes, who exploit Android vulnerabilities creating malicious applications, also known as malware, targeted specifically for these systems 2. Luckily, the technological development of this era brings enough power to machine learning algorithms, considered the standard for many domains, including cyber-security and, specifically, malware detection, which has shown to be very efective also against never-seen malware families [ 1, 2, 3, 4, 5, 6 ].

However, real-world data experience a phenomenon known as concept drift, i.e. their temporal evolution [7]. In particular, Android applications naturally change over time since attackers keep adjusting malware to bypass detection, and legitimate applications embrace new frameworks and programming patterns while abandoning deprecated technologies. Recent work highlighted how concept drift worryingly afects the performance of state-of-the-art Android malware detectors, revealing how much it drops over time, contradicting the results achieved by their original analysis since they were inflated by wrong evaluation settings [ 8]. On top of this issue, the only proposals to counter the concept drift rely on continuous update or retraining of machine learning models [9, 10, 11, 12, 13], instead of tracking which are the characteristics of data that mainly change over time.

Hence, we start bridging the gaps left in the state-of-the-art by proposing novel techniques that understand the concept drift and take advantage of it. The contribution of this work are summarized as follows: (i) we propose a drift-analysis framework that investigates the reasons causing the concept drift inside data, highlighting which features are more prone to have a negative contribution to the performance decay; and (ii) we propose SVM-CB, a novel classifier that leverages our drift-analysis information to bound the selected unstable features, reducing the overall performance drop.

We show the efectiveness of SVM-CB, by comparing its performance over time with Drebin [ 1 ], a state-of-the-art linear classifier. To obtain a fair comparison, we train both classifiers on the same dataset, and we show how SVM-CB better withstand the passing of time, thanks to the domain knowledge acquired through the results of our drift-analysis framework, thus allowing SVM-CB to be updated less often compared to Drebin.

We conclude by discussing future directions of this work considering fewer heuristic rules to tune SVM-CB, and extensions of our methodology to non-linear classifiers.

2. Android Malware Detection over Time

Before delving into the details of the proposed methods, we firstly describe the structure of Android applications to lay a foundation for understanding the classifier that we consider in this work, and we discuss the problem and proposed solutions to the concept drift problem. Android Applications. These are programs that run on the Android Operating System. They are distributed as an Android Application Package (APK), an archive file with the .apk extension. An APK contains diferent files: (i) the AndroidManifest.xml, that stores all the required information needed by the operating system to correctly handle the application at run-time;3 (ii) the classes.dex, that stores the compiled source code and all user-implemented methods and classes; and (iii) additional .xml and resource lfies that are used to define the application user interface, along with additional functionality or multimedia content.

Malware Detection with Machine Learning. We select a popular binary detector named Drebin [ 1 ] as a baseline for our proposals, for which we show the architecture in Fig .whose 3https://developer.android.com/guide/topics/manifest/manifest-intro architecture is described in Fig. 1. This classifier relies on a Support Vector Machine (SVM) [ 14] trained on top of hand-crafted features extracted from APKs provided at training time, and they consider: (i) features extracted from AndroidManifest.xml, like hardware components, requested permissions, app components, and filtered intents; (ii) features extracted from classess.dex, including restricted API calls, used permissions, suspicious API calls, and network addresses. All this knowledge is encoded inside -dimensional feature vectors, whose entries are 0 or 1 depending on the absence or presence of a particular characteristic. Since Drebin relies on an SVM, it can be used to investigate its decision-making process since each feature is already correlated with a weight that describes its orientation toward one of the two prediction classes, namely legit and malicious.

Performance over Time. Even though Drebin registered impressive performance in detecting malware, it was not properly tested inside a time-aware environment. I ts training relies on the Independent and Identical Distribution (I.I.D.) assumption, which takes for granted that both training and testing samples are drawn from the same distribution. While this property might hold for the image classification domain, it can not be satisfied for the rapidly-growing domain of programs, where training samples difer from future test data as new updates, frameworks, and techniques are introduced while others are deprecated. The classic evaluation setting injects artifacts inside the learning process, like the presence of samples coming from mixed periods, allowing the classifier to rely on future knowledge at training time. Such has been demonstrated by Pendlebury et al. [8], that show how selected state-of-the-art detectors are characterized by worrying performance drops when evaluated with a more realistic time-aware approach. 3. Analysing and Improving Robustness to Time We now introduce the two contributions of our work: (i) the drift-analysis framework to either understand the causes of the concept drift by inspecting the features extracted from data at diferent time intervals and quantifying their contribution to the overall performance drop; and (ii) the time-aware learning algorithm SVM-CB (i.e. SVM with Custom Bounds), that uses drift-analysis information to select and bound the weights of a chosen number of features considered unstable to reduce their contribution to the performance decay caused by time. Drift-analysis framework. Our first contribution tackles the open problem of explaining the concept drift, and we propose the temporal feature stability (T-stability), a novel metric measuring the single feature contribution to the performance decay, designed for linear classifiers. This metric captures two distinct characteristics of each single feature when dealing with time: their relevance in the classifier prediction and their temporal evolution. These are quantified by the product between (i) the weight corresponding to the -th feature, learned at training time by the classifier; and (ii) the slope that approximate the temporal evolution of the values of the feature.

To compute our metric, we start with the hypothesis that a decrement in the detection rate of malware is strictly related to a decaying score assigned to malware samples as time passes. Such behavior corresponds to a shift of the malware class distribution towards the decision boundary learned at training time, thus increasing the number of misclassified samples. To quantify our intuitions, we analyze the variation of the malware score over time, and we compute the conditional expectation of the score over all malware samples (identified with the label = 1) at time as:

⎡⎛ ⎞ ⎤ ⎡ ⎤ [ + | = 1, ] = ⎣⎝∑︁ · , ⎠ + | = 1, ⎦ = ⎣∑︁ · [, | = 1, ]⎦ + =0 =0 (1) where the score is computed as the scalar product between , the vector containing the weights of the linear classifier with bias , and the -dimensional feature vector representation of an input Android application.

Since we want to quantify how the features contribute to the score expectation evolution, we consider the derivative of Eq. 1, being the summation over the products between weights and the derivatives of the feature expectation w.r.t. time.

[ + |, ] = ∑− ︁1 · [ |, ] =1 ≈ − 1 − 1 ∑︁ · = ∑︁ =0 =1 (2) Since we are interested in capturing the overall trend of the score decay, we approximate each derivative of the -th feature with the slope of the regression line that best fits the single feature expectation over time. Here, we compress the product · in a single value , that is how we compute the T-stability of the feature . Intuitively, the larger and negative the T-stability metric is for a feature, the more such feature accelerates the degradation of the classifier.

Since expectations are not computable for a specific time instant , we quantize the time variable considering time slots with length Δ, where the -th slot indicate the subset of malware samples registered at time ∈ [Δ, ( + 1)Δ], being an integer variable. Thus, we use Alg. 1 to obtain the vector containing the T-stability of each feature. After having computed the number of available time slots based on the timestamps in , and the chosen time window Δ (line 1), we initialize a utility matrix that will contain the mean feature values (line 2). Then we iterate through the time slots (line 3) and select, for each one, the subset (line 4) needed to compute the mean feature value at time Δ storing it in the -th column of (line 5). After this step, we loop over the number of features (line 7) to compute the slope of the -th feature over time, i.e. the -th row of (line 8), to eventually return the Algorithm 1 Drift Analysis Input : The input timestamped and labeled dataset = {, , }=1; the time window Δ; the weights of the reference classifier ′.

Output : the T-stability vector 1 ← ⌈ ( − )/Δ⌉ ◁ Compute number of time slots 2 ← (, ) ◁ Initialize utility matrix 3 for ∈ [0, − 1] do 4 ← { (, , ) ∈ : = 1, ∈ [Δ, ( + 1)Δ]} ◁ Obtain data in time slot 5 * , ← |1| ∑︀∈ ,* 6 ← () 7 for ∈ [0, − 1] do 8 ← (,* ) 9 ← ∘ 10 return ◁ Compute mean feature value in time slot ◁ Compute slope of the regression line ◁ Compute the T-stability vector ◁ Return T-stability vector Hadamard product between the classifier trained weights and the feature slopes (line 9), i.e. the T-stability vector .

Robustness to Future Changes. As our second contribution, we show how to exploit the information obtained with the drift-analysis inside the optimization process to train SVM-CB, an SVM classifier hardened against the passing of time. To train SVM-CB, we consider a reference temporally unstable classifier to compute the T-stability for each feature. Then, we select the unstable features, that are the of them that have the most negative values. Our goal is to train a new classifier that relies less on these unstable features, thus we bound the absolute value of the correspondent weights to directly reduce their contribution in Eq. 2. This can be formalized as the constrained optimization problem in Eq. 3, where the hinge loss is minimized subject to a constrained on the subset of weights , i.e. the weights correspondent to the unstable features, that are forced to be lower than a specific bound in their absolute value. arg min , ..

∑︁ (0, 1 − (; , )), =1 | | < , ∀ ∈ . (3) (4)

We show in Alg. 2 the time-aware training algorithm for SVM-CB that minimize this objective through a gradient descent procedure. The algorithm is initialized by firstly identifying the subset of weights corresponding to the unstable features (lines 1-3). Then, for each iteration, we firstly modulate the learning rate with the function () to improve convergence (line 6), we update the parameters of the classifier to train by applying gradient descent (lines 78), to eventually clip the weights contained in to the bound if their absolute value exceed it (line 9), as described in Eq. 4. After iterations, the algorithm returns the learned parameters and . 10 return (), ()

4. Experiments

◁ Return the learned parameters Algorithm 2 SVM-CB learning algorithm Input : = {, }=1, the training data; , the absolute value of the bound that must be applied to the weights; , the T-stability vector; , the number of weights that must be bounded; , the number of iterations; (0), the initial gradient step size ; () a decaying function of .

Output : , , the trained classifier’s parameters. 1 ← ( ) ◁ Initialize feature indexes ordered w.r.t. 2 ← { : = 0, ..., }, ∈ . ◁ Select first indexes 3 Initialize = { : ∈ } ◁ Select corresponding weights 4 ((0), (0)) ← (0, 0) ◁ Initialize parameters 5 for ∈ [1, ] do 6 () ← (0)() ◁ Update learning rate 7 () ← (− 1) − ()∇ℒ ◁ Update weights 8 () ← (− 1) − ()∇ℒ ◁ Update bias 9 () ← ((); , ) ◁ Clip weights based on Eq. 4 criteria We now apply our methodology to quantify how it explains and hardens a classifier against the performance decay compared with the time-agnostic classifier Drebin [ 1 ].

Dataset. We leverage the dataset provided by Pendlebury et al. [8], composed of 116,993 legitimate and 12,735 malicious Android applications sampled from the AndroZoo dataset [15], spanning from January 2014 to December 2016. We replicate their temporal train-test split as shown in Fig. 2, by dividing them between December 2014 and January 2015, and we set the time slot Δ equal to 1 month to ensure suficient statistics for each. We hence extract 465,608 from the training set to match the original formulation of Drebin [ 1 ].

Models. We consider Drebin as the baseline classifier, trained with the parameter set to 1, and we compare it with two versions of SVM-CB by considering diferent bounds on the unstable features detected by the drift-analysis framework. We will refer to the baseline classifier as SVM since the underlying feature extractor and the feature embedding module are the same for all the classifiers under analysis.

Drift Analysis Results. To identify the features responsible for the performance decay over time in our baseline SVM, we firstly show in Fig. 3 the trend of the mean score assigned respectively to malicious (Fig. 3a) and benign samples (Fig. 3b) over all the testing periods. While the classifier assigns, on average, an almost constant negative score to the goodware class, the mean score assigned to malware gradually approaches to zero to eventually become negative after 10 months, thus validating the hypothesis claimed in Sect. 3.

We compute the T-stability vector through Alg. 1 for the learned weights of the SVM w.r.t. the timestamped training set, and we show the first 104 T-stability values in increasing order along with the corresponding features in Fig. 3c. The latter highlights that most of the contribution to the performance decay is caused by roughly 100 features among all the feature 9 9 4 /4788 /6399 7 1 4 1 9 5 /4 5 3 6 set, while all the remaining ones do not substantially compromise the detection rate over time since their T-stability is very close to zero.

We report a subset of the selected unstable features (i.e. features presenting large negative T-stability values) in Table 1. The first 10 rows show features that the SVM associates with the goodware class and are becoming more likely to be found in malware ( < 0, > 0), while the last 10 rows show features that the SVM associates with the malware class but they are disappearing from data ( > 0, < 0). For simplicity, we will refer to the features in the ifrst and second table, respectively, as the first and the second group of features.

We can recognize in the first group features mostly related to commonly-used URLs. For instance, among them, we find “www.google.com”, “www.youtube.com”, and websites under the “facebook.com” domain, which are all legitimate URLs to browse, and the classifier links them to the goodware class by assigning them a positive weight. The second group is mostly

Feature name urls::https://graph.facebook.com/%1$s?...&accessToken=%2$s -0.008753 intents::android_intent_action_VIEW -0.010168 urls::http://www.google.com -0.021320 activities::com_revmob_ads_fullscreen_FullscreenActivity -0.006204 activities::com_feiwo_view_IA -0.004435 urls::http://i.ytimg.com/vi/ -0.005245 api_calls::android/content/ContentResolver;→openInputStream -0.003749 urls::https://m.facebook.com/dialog/ -0.004955 urls::http://market.android.com/details?id= -0.004041 urls::http://www.youtube.com/embed/ -0.004289 api_calls::android/net/wifi/WifiManager;→getConnectionInfo -0.003469 app_permissions::name=’android_permission_MOUNT_UNMOUNT_FILESYSTEMS’ -0.004508 urls::http://e.admob.com/clk?... -0.006713 activities::com_feiwothree_coverscreen_SA -0.003564 interesting_calls::Cipher(DES) -0.008910 intents::android_intent_action_PACKAGE_ADDED -0.022435 activities::com_fivefeiwo_coverscreen_SA -0.003813 intents::android_intent_action_CREATE_SHORTCUT -0.012456 intents::android_intent_action_USER_PRESENT -0.021155 activities::com_feiwoone_coverscreen_SA -0.010022 characterized by features related to intents and activities. For instance, we find the presence of a cipher algorithm (“interesting_calls::Cipher(DES)”), reported to be used to obfuscate and encrypt part of the malicious application.4 However, this feature has a decreasing trend ( < 0), meaning that malware relies less on this method as time passes, probably because it would ease the detection of the malware under manual inspection.

From this analysis, we can deduce that the unstable features can be grouped into two types of features: (i) goodware-related features that malware creators are starting to inject in their malicious code to increase the probability of it being recognized as goodware, and (ii) malwarerelated features that malware creators are starting to deprecate to reduce the probability of it being recognized as malware.

Improving Robustness. We now leverage the results of our drift-analysis framework performed on the SVM by training SVM-CB using Alg. 2, running it for 2000 iterations, with the initial learning rate (0) set to 7 · 10− 5 and we use the cosine annealing function as () to modulate it over the iterations. We heuristically choose the number of features to bound = 102, since these are the ones the most contribute to the performance decay (Fig. 3c). We train two versions of SVM-CB, referred as (i) SVM-CB(H) the classifier with = 0.8 and (ii) SVM-CB(L) the classifier with = 0.2. These two diferent bounds allow us to better understand how the robustness against the concept drift changes when we apply softer ( = 0.8) or harder ( = 0.2) constraints to the correspondent weights. We report the performance analysis of 4https://www.virusbulletin.com/virusbulletin/2014/07/obfuscation-android-malware-and-how-fight-back (a) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 3 6 9 12 15 18 21

Month 0.95 0.90 0.85 0.80 0.75 0.70 000...566505 SSSeeVccM--SSVVMM--CCBB((LH)) 0.50 0 3 6 9 12 15 18 21

Month (d) these classifiers in Fig. 4, where we show the evolution over the testing periods of the recall (red) and the precision (blue) for the SVM (Fig. 4a) and SVM-CB (L-H) (Fig. 4c and 4b). We will focus mainly on the discussion of the recall curves, as our primary concern is the detection rate of the malware samples over time, which is computed in the same way. Also, we will not discuss the results concerning the last two months, as the number of samples is not suficiently large for a proper evaluation (as highlighted by Fig. 2).

We correctly replicated the results obtained by Pendlebury et al. [8] for the SVM, which presents the highest recall among the tested classifiers in the first testing periods, starting from 76.4%, dropping fast towards a 28.8% recall at 16-th month to eventually rise to 45.3% at 21-th month. Although the initial detection rate of SVM-CB(L) is lower than 70% it fluctuates less w.r.t. to the baseline by maintaining the performance around 50-60% with a final drop to 35.8% at the third to last month. SVM-CB(H) presents an initial recall of 69.4%, while it decays to 43.2% once it reaches the 22-th month. Coherently to the results obtained by Pendlebury et al. [8], we observe that the baseline SVM is characterized by the fastest performance decay, while the other classifiers start between 60% and 70% recall. The peak of temporal robustness is reached by SVM-CB(L) where the recall curve seems to be almost flattened, while SVM-CB(H) has indeed a slower decay w.r.t. the SVM but faster than SVM-CB(L). Lastly, Fig. 4d shows the Area Under the Receiving Operating Curve (ROC) curve for each testing period, computed up to 5% FPR. Here we indirectly discuss the correlation between precision and recall considering the performance when we fix a constant percentage of goodware misclassified as malware for each month in order to better measure and compare the data separation capabilities of the three classifiers. The AUC curves reflect what we have discussed for the recall: the SVM starts as usual with the highest AUC and decays rapidly below all the other AUC curves, while the other classifiers start with a lower AUC that reveals to be higher than SVM when approaching the 10-th month. SVM-CB(L) has been confirmed to be the more stable classifier even in this constrained evaluation setting with low FPR.

5. Related Work

We now ofer an overview of state-of-the-art techniques similar to our proposal. Pendlebury et al. [8] proposes Tesseract, a test-time evaluation framework to determine the faultiness of classifiers in the presence of the concept-drift. The authors show that evaluations are afected by misleading biases that inject artifacts inside the trained machine learning model, thus causing a performance decay once the model faces real-world data. Tesseract highlights how diferent proposed models do not cope with the concept drift of Android applications and that faulty training settings inflated their original evaluations. While Tesseract is a consistent method to include concept drift in the evaluation, it is not designed to either fix or mitigate its presence.

Jordaney et al. [10], propose Transcend, a framework that signals the premature aging of classifiers before their performance starts to degrade consistently by analyzing the diference between samples observed at training at test time. On top of this methodology, Barbero et al. [11] propose Transcendent, which improves Transcend to include the rejection of out-ofdistribution samples that cause the performance drops. However, they do not propose methods to harden a classifier against concept drift, rather they focus on protection systems exploiting samples encountered during deployment, such as a notification when data start difering from the training one [10], or directly rejecting a sample coming from a drifted data distribution [11].

In contrast to previous work, we consider the presence of faulty evaluations, and we extend it with a methodology that quantifies which features of the data distributions are changing and how. Such contribution not only explains the performance decay, but also helps understanding the reasons behind the concept drift. Instead of rejecting samples or just signaling the worsening of the performances of a model, we build a time-aware classifier that takes into account the acquired knowledge of the data distribution changes, and we show how our methodology can better withstand the passing of time.

6. Conclusions and Future Work

In this work, we propose a preliminary methodology that understands and provide an initial hardening against the concept drift that plagues the performance of Android malware detection. In particular, we develop a drift-analysis framework that highlights which features contribute more to the performance decay of a classifier over time, and we leverage these results to propose SVM-CB, a linear classifier hardened against the passing of time.

We show the eficacy of our proposals by applying our drift-analysis framework to Drebin, a linear Android malware detector, and we compare its performances over time against its hardened version computed through our proposed methodology. From our experimental analysis, we can precisely detect which features worsen the detection rate of Drebin and how the trained SVM-CB better withstand the passing of time. In particular, we highlight the eficacy of the bounding of these unstable features, reducing the performance drop of SVM-CB w.r.t. the baseline Drebin.

Although the obtained results are promising, this work presents the following limitations. First, the experimental setup does not guarantee that the provided solution against performance decay can be applied to other types of detectors, as this work addresses the problem of analyzing the efect of the concept drift only for linear classifiers that work only on static features [ 1, 16 ]. Also, the T-stability might not reflect the actual concept drift that afects Android applications, as it is computed on a classifier trained on a specific dataset, which approximates the real data distribution. Hence, we should also study the Android malware domain more to provide suficient and reliable evidence of why the features chosen by the drift-analysis framework are actually causing the decay. Lastly, we heuristically tuned the bounds for the selected weights of SVM-CB, but these choices could be improved with an automatic algorithm that computes the ones that lead to better robustness against time.

However, we anyhow believe that our work can suggest a promising research direction that will provide more insight on the usage of each contribution. We first intend to explore more advanced methods based on the drift-analysis framework, including an automatic bound selection for the weights inside the learning algorithm, by adopting a regularization term tailored specifically for temporal performance stability. Secondly, we intend to generalize this method to address deep learning algorithms, where the feature extractor and the feature representation of the last linear layer evolve during training.

Moreover, we will explore other research directions, such as (i) the quantification and prevention of machine learning malware detectors from forgetting old threats when updated with new data, and (ii) the inclusion of research fields such as Continual Learning, 5 which model data as a continuous stream, thus enabling the development of techniques for updating classifiers constantly and efortlessly.

Acknowledgments

This work has been partly supported by the PRIN 2017 project RexLearn, funded by the Italian Ministry of Education, University and Research (grant no. 2017TWNMH2); and by the project TESTABLE (grant no. 101019206), under the EU’s H2020 research and innovation programme. [7] G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, F. Petitjean, Characterizing concept drift, Data

Mining and Knowledge Discovery 30 (2016) 964–994. [8] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro, TESSERACT: Eliminating experimental bias in malware classification across space and time, in: 28th USENIX Security Symposium (USENIX Sec. 19), 2019, pp. 729–746. [9] A. Singh, A. Walenstein, A. Lakhotia, Tracking concept drift in malware families, in: Proceedings of the 5th ACM workshop on Security and artificial intelligence, 2012, pp. 81–92. [10] R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, L. Cavallaro, Transcend: Detecting concept drift in malware classification models, in: 26th USENIX Security Symposium (USENIX Sec. 17), 2017, pp. 625–642. [11] F. Barbero, F. Pendlebury, F. Pierazzi, L. Cavallaro, Transcending transcend: Revisiting malware classification in the presence of concept drift, arXiv preprint arXiv:2010.03856 (2020). [12] D. Hu, Z. Ma, X. Zhang, P. Li, D. Ye, B. Ling, The concept drift problem in android malware detection and its solution, Security and Communication Networks 2017 (2017). [13] A. Narayanan, L. Yang, L. Chen, L. Jinliang, Adaptive and scalable android malware detection through online learning, in: 2016 International Joint Conference on Neural Networks (IJCNN), IEEE, 2016, pp. 2484–2491. [14] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (1995) 273–297. [15] K. Allix, T. F. Bissyandé, J. Klein, Y. Le Traon, Androzoo: Collecting millions of android apps for the research community, in: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), IEEE, 2016, pp. 468–471. [16] A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, F. Roli, Yes, machine learning can be more secure! a case study on android malware detection, IEEE Transactions on Dependable and Secure Computing 16 (2017) 711–724.

[1]

Arp ,

Spreitzenbarth ,

Hubner ,

Gascon ,

Rieck ,

Siemens , Drebin: Efective and explainable detection of android malware in your pocket ., in: Ndss , volume 14 , 2014 , pp. 23 - 26 .

[2]

Mariconti ,

Onwuzurike ,

Andriotis ,

E. D.

Cristofaro , G. Ross, G. Stringhini, Mamadroid: Detecting android malware by building markov chains of behavioral models , 2017 . arXiv: 1612 . 04433 .

[3]

Grosse ,

Papernot ,

Manoharan ,

Backes ,

McDaniel , Adversarial examples for malware detection , in: European symposium on research in computer security , Springer, 2017 , pp. 62 - 79 .

[4]

M. T.

Ahvanooey ,

Li ,

Rabbani ,

A. R.

Rajput , A survey on smartphones security: software vulnerabilities, malware, and attacks , arXiv preprint arXiv: 2001 . 09406 ( 2020 ).

[5]

Souri ,

Hosseini , A state-of-the-art survey of malware detection approaches using data mining techniques , Human-centric Computing and Information Sciences 8 ( 2018 ) 1 - 22 .

[6]

Amamra ,

Talhi , J.-M. Robert , Smartphone malware detection: From a survey towards taxonomy , in: 2012 7th International Conference on Malicious and Unwanted Software , IEEE, 2012 , pp. 79 - 86 .