A study on malware detection and classification using
the analysis of API calls sequences through shallow
learning and recurrent neural networks
Angelo Cannarile2 , Francesco Carrera2 , Stefano Galantucci1,* , Andrea Iannacone2 and
Giuseppe Pirlo1
1
    University of Bari Aldo Moro, Department of Computer Science, 70125 Bari, Italy
2
    BVTech SpA, 20123 Milano, Italy


                                         Abstract
                                         Malware detection and classification is a critical issue in cybersecurity. Systems acting through signatures
                                         suffer the problem of not being able to detect attacks via zero-day malware. Among the approaches that
                                         can detect unknown attacks are the possibilities offered by analyzing the sequence of API calls performed
                                         by the executable. Such information can be extracted through static and dynamic analysis methods in a
                                         sandbox environment. This work proposes an analysis of different techniques to detect malware and
                                         subsequently classify them by identifying the family of belonging. Machine Learning algorithms based
                                         on trees are compared with Deep Learning algorithms based on Recurrent Neural Networks. The results
                                         obtained lead to choosing an algorithm based on RNNs for malware detection and an algorithm based on
                                         trees for malware classification.

                                         Keywords
                                         Malware, Classification, Detection, Trees, Neural network, malware-analysis-datasets-api-calls-
                                         sequences, APIMDS, Catak, API Calls


1. Introduction
The constant growth of the development and diffusion of new technologies expose systems
to potential risks. Among the main risks is malware, malicious software created to steal, spy,
or, more in general, damage infected systems. In order to mitigate this risk, it is necessary to
adopt tools that can detect, classify and block these threats on time. The literature analysis
shows that most malware detection systems are based on static analysis techniques such as
signature verification, which are very good for all known malicious software, but at the same
time, ineffective for new malware since the signatures of these applications are not yet available.
An alternative approach is using sandbox information (such as CAPEv2), which provides
dynamic application analysis. The sandbox software encapsulates the information extracted
in reports for each file analyzed. The reports contain the ordered sequence of API calls called

ITASEC’22: Italian Conference on Cybersecurity, June 20–23, 2022, Rome, Italy
*
 Corresponding author.
$ a.cannarile@bv-tech.it (A. Cannarile); f.carrera@bv-tech.it (F. Carrera); stefano.galantucci@uniba.it
(S. Galantucci); a.iannacone@bv-tech.it (A. Iannacone); giuseppe.pirlo@uniba.it (G. Pirlo)
 0000-0002-3955-0478 (S. Galantucci); 0000-0002-7305-2210 (G. Pirlo)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
from the analyzed file, which approximates the behavior of the software. The analysis of API
calls sequences through machine learning models to detect both known malware and as-yet-
unknown (zero-day) malware is of great interest to researchers. Once the threat is detected,
using machine learning models, it is possible to determine the malware’s type based on the
information extracted from the dynamic analysis. Labeling the detected malware with the
appropriate family allows experts in the field to optimize the analysis time and thus minimize
the response time for each identified threat. This work is proposed as an analysis of several
known techniques in order to detect and classify malware.
The goals of this paper are:

    • Compare the performance of tree-based Machine Learning algorithms and Recurrent
      Neural Networks (RNN)-based Deep Learning algorithms for the analysis of API calls
      sequences;
    • Evaluate and identify which basic models perform well for malware detection based on
      API call sequence analysis;
    • Compare the performance of API calls-based models for malware classification in their
      category.

This paper is organized as follows. In section 2 the most interesting approaches related to
malware detection and classification models based on the analysis of API calls sequences are
reported. Section 3 describes the algorithms that are used in this work. Section 4 details the
performed experiments and their notable results. Conclusions and possible future developments
of the proposed work are presented in section 5.


2. Related work
Malware analysis systems can leverage machine learning algorithms to extract meaningful
patterns from data to detect the presence of malware and recognize the category to which the
malware belongs. An innovative approach involves using the sequence of API calls invoked by
the malware itself as a "trace" of its behavior within the system. The following section analyzes
the works that represent state of the art for Malware Detection and Malware Classification
systems based on API Call sequences analysis.
Ki et al. [1] propose a new approach to dynamic malware analysis. Through DNA sequence
alignment algorithms, the authors can verify that malware belonging to the same category
share many API calls sub-sequences. Based on these, the classification of malware into different
categories is performed. The dataset created by the authors, used in the manuscript and later
made public, is also used in the present work. More details about the description of the dataset
can be found in the experiments section. The authors also propose APIMDS (API Malware
Detection System), a malware detection system based on the analysis of API calls sequences.
Once the API calls of the software to be analyzed are extracted, they are compared with those
included in the dataset. In case of correspondence, APIMDS notifies the presence of the threat
to the system administrator.
In a previous work [2], a study was carried out regarding the main algorithms used for Malware
Detection. In detail, different shallow and deep learning techniques have been compared to
recognize if the analyzed file is malware or goodware. The analysis is based on the API Call
sequences invoked by the analyzed file. These APIs are first collected by a dynamic file analysis
in a sandbox, i.e., CAPEv2. Performance on two different datasets was compared using six
different techniques. The results show that CatBoost performs better on the F1 Score and AUC
ROC. The method developed in [3] was also used for balancing the datasets.
State-of-the-art Malware Classification systems use Deep Learning algorithms to learn patterns
that can detect and classify Malware families [4]. Several Deep Learning algorithms used to
classify textual sentences are used to analyze API call sequences and classify Malware families
[5, 6].
Several supervised Machine Learning algorithms have been used in the literature to detect and
classify malware [7]. Considering the characteristics of the API call sequences to be analyzed,
traditional machine learning models may not be sufficient since the order of the calls needs to be
preserved. Considering the relationships and order between API calls could improve traditional
Malware Detection and Classification systems.
In [8] the classification of malware based on API calls sequences is done after n-gram extraction,
a Natural Language Processing technique. Based on the formed n-grams, an unsupervised
algorithm named Voting Expert extracts the patterns, which are subsequently used for training.
Tran et al. [9] propose a combined approach between Memory Augmented Neural Network
and Natural Language Processing techniques to solve the malware classification problem. The
input for the proposed algorithm is prepared from the sequence of API calls by modeling them
in a different feature space. The performance of the model has been evaluated on two datasets,
including APIMDS, and compared with some traditional models such as Random Forest, SVN,
and LSTM.
Catak et al. [10] use a Long Short-Term Memory (LSTM) architecture-based model to analyze
API call sequences and classify the malware family. The performance obtained using traditional
classification algorithms with single and multi-layer LSTM architectures are compared. A main
contribution of the work is building a new dataset for malware detection on Windows. The
dataset created for this purpose contains 7,107 samples labeled across 8 malware families. Finally,
the authors show that the LSTM architecture allows the creation of a classifier that improves
the performance of traditional algorithms when used to analyze sequences of API calls. An
interesting development in this context is described in [11], in which the main problems affected
by machine learning and deep learning models are analyzed:
    • Failure to recognize malware belonging to categories not addressed in training data;
    • The scarce amount of malware examples for category.
To overcome these issues, the authors present an innovative model of Malware Classification
called SIMPLE. Based on the input API calls sequences, they are pre-treated through a pre-
trained word embedding technique and finally modeled by a Long Short Term Memory network
to preserve the sequence information. Unlike other models of the same type, the output is
represented through multiple prototypes generated by clustering. The presented model achieves
the best accuracy concerning the reference models.
In [12] the authors propose a methodology that uses feature extraction and feature selection
techniques to process sequences for analysis by classification algorithms.
Recurrent Neural Networks and LSTM have been extensively used in Pattern Recognition to
exploit temporal sequence as in [13, 14]. Li et al. [15] propose to use Recurrent Neural Networks
to analyze ordered sequences of API calls and classify malware families. Since each malware
can invoke sequences of API calls of different lengths, the authors compare the use of two
popular types of RNNs, which are widely used to analyze time series. In the above work, Long
Short-Term Memory and Gated Recurrent Unit (GRU) algorithms are used to clarify the best
RNN architecture for malware classification by analyzing long sequences of API calls. In the
described experiments, a reference dataset was used to evaluate and compare the performance
of the classifiers. The results showed that the LSTM model and the GRU model achieve very
similar performance, and both are effective at classifying malware through the API call sequence
analysis.
A classifier called "Random Transformer Forest" is proposed in [16], which uses a Transformer
forest for analyzing API call sequences and categorizing malware categories. Unlike traditional
machine learning and deep learning models, Transformer-based models process the sequences
as a whole and learn the relationships between API calls through Multi-Head Attention mecha-
nisms and positional embeddings. The reported experiments show that the proposed model
outperforms the performance obtained using an LSTM architecture. Moreover, it is shown
that BERT or CANINE, the pre-trained Transformer models, reach better performance when
classifying highly unbalanced malware families.


3. Algorithms used
Consider the goal of implementing a malware detection system. Such a system should also be
able to classify the detected malware. It is desired that the system performs the operations by
evaluating information extracted from the dynamic analysis of files. Therefore, it is necessary
to obtain a workflow that can perform the operations mentioned earlier, which are, in order:
dynamic analysis of the executable, identification of its type (malware/goodware), and, if it is a
malware, classification in its specific family. An isolated and secure environment is used for the
dynamic analysis, i.e., CAPEv2, which examines the file and returns a report containing the
sequence of API calls invoked by the analyzed file. Thus, it is desired to find which Machine
Learning and Deep Learning algorithms achieve the best performance when used to analyze
API call sequences.
The proposed system uses a two-step approach to increase the detection rate of malicious files
and improve the accuracy in classifying malware families. In detail, the system uses in the
first step a Malware Detection algorithm to detect malware, which, regardless of the family
of belonging, must be blocked and analyzed by experts in the field. During the second step, a
Malware Classification algorithm is used to support the analysis of malicious files; it inputs
the sequence of API calls captured for the malware in question and outputs the corresponding
family.
Therefore, according to the Malware Detection model, it is necessary to build a binary classifier,
which allows for detecting as many malware as possible; instead, in the second step, it is neces-
sary to build a multi-class classifier, which accurately assigns to each malware the appropriate
class.
The classification algorithms analyzed in this paper are now reported. All the algorithms exam-
ined can be used both to build binary classification models, i.e., to build the Malware Detection
model; and for multiple classification problems, i.e., to build the Malware Classification model.
From the review of the literature conducted in the previous section, it can be seen that some of
the most interesting approaches are tree-based classifiers and deep learning algorithms generally
used to classify textual sequences; based on this study, a choice of algorithms to be analyzed
was made.
The algorithms examined in the proposed work are:
    • Random Forest [17]: classifier obtained from an aggregation, through bagging technique,
      of decision trees;
    • CatBoost [18]: machine learning method based on Gradient Boosting applied to decision
      trees;
    • XGBoost [19]: algorithm for creating prediction models from an ensemble of smaller
      prediction models, represented by decision trees;
    • ExtraTrees [20]: algorithm that aggregates the results of multiple unrelated decision
      trees into a forest that produces the classification output. The idea behind it is most
      similar to the Random Forest algorithm and differs from it only in the way the decision
      trees are constructed in the forest;
    • TabNet [21]: Neural network for tabular data processing;
    • Bidirectional Long Short-Term Memory (Bi-LSTM): evolution of the model Long
      Short Term Memory [22], belongs to the category of RNN. It is defined bidirectional since
      the flow of information propagates in both directions;
    • Bidirectional Gated Recurrent Unit (Bi-GRU): evolution of the Gated Recurrent
      Unit [23], belongs to the category of RNNs. It uses a bidirectional approach to analyze
      sequences in both directions as well.


4. Experimental setup
This section describes the experiments performed in the proposed work. All experiments were
executed on a machine with 16 cores, 38 GB of RAM, and Ubuntu 20.04.3 operating system.
Section 4.1 describes the datasets used to evaluate the algorithms considered in this work. In
sections 4.2 and 4.3, the results obtained in the experiments performed for the construction
of the Malware Detection and Malware Classification model, respectively, are presented. The
obtained results and the achievement of the proposed goals are discussed in section 4.4.

4.1. Dataset
The malware-analysis-datasets-api-call-sequences and APIMDS datasets were examined to evalu-
ate malware detection algorithms. The malware-analysis-datasets-api-call-sequences [24] dataset
contains 42,797 malware-related API call sequences and 1079 goodware API call sequences.
Each API call sequence is made of the first 100 consecutive non-repeating calls associated with
the parent process, extracted from the "calls" elements of the reports obtained from Cuckoo
Sandbox. For malware-analysis-datasets-api-call-sequences dataset, there are no labels for the
malware categories. The APIMDS [1] dataset was constructed from the random selection of
records belonging to the Malicia-Project and VirusTotal malware datasets. It collects the API
calls sequences of 23080 malware and 300 goodware. The extracted sequences contain 2727
different API calls. The following classes represent the malware types:
    • Backdoor;
    • Worm;
    • Packed;
    • PUP;
    • Trojan;
    • Misc.
Since the APIMDS dataset contains both malware family labels and records labeled as goodware,
it is used in the present work for both the malware detection task and the malware classification
task. For the evaluation of the malware classification algorithms, the Catak and APIMDS datasets
were considered.
To build the Catak dataset [10], the author analyzed within an isolated Cuckoo Sandbox envi-
ronment several malware samples, for each of which the ordered sequence of API calls made to
the Windows operating system was recorded. In each sequence, API calls can be present in a
repeated form, even consecutively. The extracted sequences contain 162 different API calls. The
dataset is labeled with different types of malware and does not contain any information related
to goodware analysis. For this reason, if a supervised approach is used, the examined dataset
can be used to perform Malware Classification but cannot be used for Malware Detection.
The dataset contains 7,107 samples, which are labeled using the following Malware categories:
    • Spyware;
    • Downloader;
    • Worms;
    • Adware;
    • Trojan;
    • Backdoor.
    • Virus;
    • Dropper.

4.2. Malware detection experiments
Starting from the state of the art study on algorithms carried out in [2], in the present section it
is aimed to compare the results obtained in the previous work with Deep Learning algorithms
used to analyze temporal sequences. The previous work is now extended by repeating the
experiments, under the same conditions, with Bi-LSTM and Bi-GRU models. The experiments
were performed by analyzing the malware-analysis-datasets-api-call-sequences and APIMDS
datasets. Tests were not repeated on the Catak dataset because, as expressed earlier, it does not
contain API call sequences related to goodware files.
Tables 1 and 2 shows the results obtained from the Malware Detection experiment on the
malware-analysis-datasets-api-call-sequences (Table 1) and APIMDS (Table 2) datasets. Specif-
ically, the results are computed through stratified cross-validation with 𝑘 equal to 10 and
averaged across them. Accuracy, Recall, and F1-Score metrics are calculated with macro av-
eraging. The tables also contain the training and prediction times measured by the Malware
Detection experiment. The training time represents the average training times measured to
build the Malware Detection model in the 10 k-Fold experiments. In the same way, the prediction
time represents the average of the prediction times measured to identify the malware contained
in the test dataset in the 10 experiments of the k-Fold.

Table 1
Average metrics computed for the 10 folds in the malware detection experiment on the malware-analysis
dataset-api-call-sequences dataset. Training and prediction time (in seconds) are the average of the
respective times measured in the stratified k-Fold
 Algorithm       Precision   Recall   F1 Score   AUC ROC   Accuracy   Training time   Prediction time
 Random Forest   0.965       0.809    0.869      0.809     0.990      15.957          0.088
 CatBoost        0.962       0.820    0.877      0.820     0.990      8.991           0.011
 XGBoost         0.957       0.781    0.847      0.782     0.988      8.740           0.032
 ExtraTrees      0.970       0.741    0.816      0.741     0.986      6.673           0.089
 TabNet          0.898       0.767    0.817      0.767     0.986      198.780         0.188
 Bi-LSTM         0.926       0.882    0.903      0.882     0.991      1319.353        3.600
 Bi-GRU          0.950       0.893    0.918      0.893     0.993      2013.649        3.287


Table 2
Average metrics computed for the 10 folds in the malware detection experiment on the APIMDS dataset.
Training and prediction time (in seconds) are the average of the respective times measured in the
stratified k-Fold
 Algorithm       Precision   Recall   F1 Score   AUC ROC   Accuracy   Training time   Prediction time
 Random Forest   0.991       0.921    0.953      0.921     0.986      6.314           0.034
 CatBoost        0.995       0.940    0.965      0.940     0.990      15.553          0.034
 XGBoost         0.994       0.962    0.978      0.963     0.998      7.158           0.014
 ExtraTrees      0.995       0.913    0.950      0.913     0.984      1.481           0.036
 TabNet          0.984       0.926    0.953      0.926     0.997      2950.518        1.550
 Bi-LSTM         0.990       0.970    0.979      0.970     0.999      1198.291        4.057
 Bi-GRU          0.996       0.979    0.988      0.979     0.999      326.068         2.678


4.3. Malware classification experiments
This section describes the experiments performed to compare classification algorithms for mal-
ware family identification. Specifically, the performance obtained using the Machine Learning
and Deep Learning algorithms described in section 3 were compared. The experiments were
performed using the Catak and APIMDS datasets. Tests were not repeated on the malware-
analysis-datasets-api-call-sequences dataset since it does not contain malware family labels.
To build the Malware Classification models, all sub-sequences with consecutively repeated API
calls were removed, leaving only one call for each sub-sequence. There is a high concentration
of null values since not all records have the same amount of calls. In order to mitigate this
problem, the columns in the dataset in which there is a presence of null values greater than 20%
have been removed.
Tables 3 and 4 presents the calculated results for each algorithm derived from the stratified
cross-validation experiment with 𝑘 equal to 10 and averaged with each other. The Accuracy,
Recall, F1-Score metrics are calculated with macro averaging, instead the AUC ROC value was
calculated according to the "one vs rest" approach. The reference dataset for table 3 is APIMDS,
while table 4 presents the results obtained on the dataset Catak.
The tables also include the training and prediction times measured by the Malware Classification
experiment. As in previous experiments, the training time is the average of the training times
measured to build the Malware Classification model in the 10 k-Fold experiments. Similarly, the
prediction time represents the average of the prediction times measured to classify the malware
family contained in the test dataset for the 10 k-Fold experiments.

Table 3
Average metrics computed for the 10 folds in the malware classification experiment on the APIMDS
dataset. Training and prediction time (in seconds) are the average of the respective times measured in
the stratified k-Fold
 Algorithm        Precision   Recall   F1 Score   AUC ROC    Accuracy   Training time   Prediction time
 Random Forest    0.884       0.694    0.760      0.821      0.904      783.183         1.973
 CatBoost         0.901       0.663    0.737      0.804      0.899      52.084          0.030
 XGBoost          0.911       0.615    0.688      0.776      0.888      45.911          0.057
 ExtraTrees       0.878       0.698    0.762      0.824      0.904      2289.088        7.329
 TabNet           0.778       0.362    0.683      0.786      0.877      4136.281        2.026
 Bi-LSTM          0.770       0.715    0.737      0.833      0.863      1456.415        5.299
 Bi-GRU           0.766       0.704    0.730      0.826      0.865      1827.840        5.317


Table 4
Average metrics computed for the 10 folds in the malware classification experiment on the Catak dataset.
Training and prediction time (in seconds) are the average of the respective times measured in the
stratified k-Fold
 Algorithm        Precision   Recall   F1 Score   AUC ROC    Accuracy   Training time   Prediction time
 Random Forest    0.591       0.568    0.575      0.752      0.555      1822.669        1.939
 CatBoost         0.521       0.509    0.512      0.718      0.497      97.479          0.025
 XGBoost          0.469       0.452    0.451      0.685      0.439      19.450          0.028
 ExtraTrees       0.593       0.571    0.578      0.753      0.557      754.477         2.332
 TabNet           0.334       0.336    0.326      0.619      0.318      11454.760       2.175
 Bi-LSTM          0.521       0.514    0.516      0.721      0.505      947.194         4.047
 Bi-GRU           0.506       0.506    0.504      0.716      0.494      788.007         2.827


4.4. Discussion
The above section results show that the considered algorithms constitute the state of the art for
malware detection and classification systems. In the related experiment for Malware Detection,
it can be observed that Deep Learning algorithms do not always achieve better performance
than Machine Learning algorithms and vice versa. It is notable that, for both datasets analyzed
for Malware Detection, Bi-LSTM and Bi-GRU algorithms (which are generally used to classify
textual sequences) always achieve higher performances than tree-based algorithms. According
to the results reported in Tables 1 and 2, the Bi-LSTM and Bi-GRU algorithms outperform
the CatBoost algorithm, which was identified as the most suitable algorithm for building the
Malware Detection system in previous work [2]. Maximizing the Recall metric is necessary
for the Malware Detection system, as it is an index that expresses how many of the malware
instances are correctly classified as malicious files and how many of the goodware instances are
detected as non-malicious files. In both datasets analyzed for Malware Detection, the Bi-GRU
algorithm performed best in terms of Recall, AUC ROC, F1-Score, and Accuracy.
To identify the algorithm to be used in the second phase, it is necessary to analyze the results
reported in section 4.3. The ExtraTrees algorithm outperformed the performance obtained
for the Catak dataset in the [15, 10] works. Even for experiments related to malware family
classification, the results prove that Machine Learning algorithms do not always outperform
Deep Learning algorithms and vice versa. Unlike the Malware Detection model, the Accuracy
index must also be maximized for the Malware Classification model. Indeed, the label on the
type of malware assigned by the classification model must be correct to make this information
helpful to experts who analyze files that are considered malicious.
For this reason, when choosing the Malware Classification model, it is necessary to consider the
F1-score index, which is a harmonic average of the Accuracy and Recall scores. From the results
shown in tables 3 and 4, it can be observed that the algorithm that overall gets the best results is
ExtraTrees. For the APIMDS dataset, the ExtraTrees algorithm scores the highest values for the
F1-score and Accuracy metrics and one of the highest AUC ROC index scores. For the Catak
dataset, all metrics computed in the experiments using the ExtraTrees algorithm are higher
than the metrics obtained using the other surveyed algorithms.
For completeness, observing the temporal performances, it emerges how the best techniques
allow recognition in relatively short times, even if not suitable for a context with large volumes
of data. The training phase is the heaviest; however, this can be carried out cyclically, such as
during the hours in which there is less traffic to analyze.


5. Conclusions and future work
Aiming at realizing a workflow for malware analysis, experimental analysis was carried out
between the performances of malware detection and classification algorithms based on the
analysis of API call sequences. From the obtained results, it is possible to state that defining the
best algorithm for the addressed tasks is not always possible. In the experiments carried out
for Malware Detection, the Bi-GRU algorithm has obtained the best performances in terms of
Recall, AUCROC, F1-Score, and Accuracy in both analyzed datasets. On the other hand, the
analysis of the experiments performed for Malware Classification shows the best results by the
ExtraTrees decision tree-based algorithm.
This study shows that the algorithms based on the RNN architecture achieve great performance
in malware detection. RNNs can work on sequences of an arbitrary length, overcoming the
limitations imposed by other tree-based ones such as CatBoost. This feature allows Malware
Detection models to generalize over new records and identify sub-sequences of API calls that can
be traced back to malware. The results reported in section 4.2 show that RNN-based algorithms
detect more malicious files than tree-based algorithms, as they achieve a significantly higher
Recall score at the cost of comparable accuracy with the models analyzed in previous work.
Unlike the detection model, higher classification accuracy is needed in malware classification
cases. Since many types of malware invoke similar API sequences, tree-based algorithms can
better discriminate the particular malware type.
Future developments of this work will focus on studying pre-processing techniques that can
remove the noise in the API calls sequences and optimize the discrimination ability of the
classification models. Moreover, it is of great interest to investigate the use of ensemble learning
techniques based on innovative RNN architectures. Future analysis will also evaluate the
applicability of such solutions in contexts with large volumes of data to be analyzed, i.e., in
systems where time is critical.


Acknowledgments
This work was partially supported by the Fondo Europeo di Sviluppo Regionale Puglia Pro-
gramma Operativo Regionale (POR) Puglia 2014–2020 Axis I–Specific Objective 1a–Action 1.1
(Research and Development) – Project Title: CyberSecurity and Security Operation Center
(SOC) Product Suite by BV TECH S.p.A., under Grant CUP/CIG B93G18000040007.


References
 [1] Y. Ki, E. Kim, H. K. Kim, A novel approach to detect malware based on api call sequence
     analysis, International Journal of Distributed Sensor Networks 11 (2015) 659101.
 [2] A. Cannarile, V. Dentamaro, S. Galantucci, A. Iannacone, D. Impedovo, G. Pirlo, Comparing
     deep learning and shallow learning techniques for api calls malware prediction: A study,
     Applied Sciences 12 (2022) 1645.
 [3] V. Dentamaro, D. Impedovo, G. Pirlo, Licic: less important components for imbalanced
     multiclass classification, Information 9 (2018) 317.
 [4] B. Kolosnjaji, A. Zarras, G. Webster, C. Eckert, Deep learning for classification of malware
     system call sequences, in: Australasian joint conference on artificial intelligence, Springer,
     2016, pp. 137–149.
 [5] Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, P. Yu, L. He, A text classification survey: from
     shallow to deep learning, arXiv preprint arXiv:2008.00364 (2020).
 [6] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, Deep learning–
     based text classification: a comprehensive review, ACM Computing Surveys (CSUR) 54
     (2021) 1–40.
 [7] D. Ucci, L. Aniello, R. Baldoni, Survey of machine learning techniques for malware analysis,
     Computers & Security 81 (2019) 123–147.
 [8] A. Pektaş, T. Acarman, Malware classification based on api calls and behaviour analysis,
     IET Information Security 12 (2018) 107–117.
 [9] K. Tran, H. Sato, M. Kubo, Mannware: A malware classification approach with a few
     samples using a memory augmented neural network, Information 11 (2020) 51.
[10] F. O. Catak, A. F. Yazı, O. Elezaj, J. Ahmed, Deep learning based sequential model for
     malware analysis using windows exe api calls, PeerJ Computer Science 6 (2020) e285.
[11] P. Wang, Z. Tang, J. Wang, A novel few-shot malware classification approach for unknown
     family recognition with multi-prototype modeling, Computers & Security 106 (2021)
     102273.
[12] U. Nandagopal, S. Thirumalaivelu, Classification of malware with mist and n-gram features
     using machine learning, International Journal of Intelligent Engineering & Systems (2021).
[13] V. Dentamaro, P. Giglio, D. Impedovo, L. Moretti, G. Pirlo, Auco resnet: an end-to-end
     network for covid-19 pre-screening from cough and breath, Pattern Recognition 127 (2022)
     108656.
[14] G. Cicirelli, D. Impedovo, V. Dentamaro, R. Marani, G. Pirlo, T. D’Orazio, Human gait
     analysis in neurodegenerative diseases: a review, IEEE Journal of Biomedical and Health
     Informatics (2021).
[15] C. Li, J. Zheng, Api call-based malware classification using recurrent neural networks,
     Journal of Cyber Security and Mobility (2021) 617–640.
[16] F. Demirkıran, A. Çayır, U. Ünal, H. Dağ, An ensemble of pre-trained transformer models
     for imbalanced multiclass malware classification, arXiv preprint arXiv:2112.13236 (2021).
[17] L. Breiman, Random forests, Machine learning 45 (2001) 5–32.
[18] J. T. Hancock, T. M. Khoshgoftaar, Catboost for big data: an interdisciplinary review,
     Journal of big data 7 (2020) 1–45.
[19] C. Bentéjac, A. Csörgő, G. Martínez-Muñoz, A comparative analysis of gradient boosting
     algorithms, Artificial Intelligence Review 54 (2021) 1937–1967.
[20] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Machine learning 63 (2006)
     3–42.
[21] S. O. Arık, T. Pfister, Tabnet: Attentive interpretable tabular learning, in: AAAI, volume 35,
     2021, pp. 6679–6687.
[22] M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE transactions on
     Signal Processing 45 (1997) 2673–2681.
[23] K. Cho, B. Van Merriënboer, D. Bahdanau, Y. Bengio, On the properties of neural machine
     translation: Encoder-decoder approaches, arXiv preprint arXiv:1409.1259 (2014).
[24] A. Oliveira, R. Sassi, Behavioral malware detection using deep graph convolutional neural
     networks, TechRxiv (2019).