Malware Classification Using Static Disassembly and Machine Learning Zhenshuo Chen1[0000−0003−2091−4160] , Eoin Brophy1[0000−0002−6486−5746] , and Tomas Ward1,2[0000−0002−6173−6607] 1 Dublin City University, Dublin, Ireland 2 Insight Centre for Data Analytics, Dublin, Ireland Abstract. Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, the traditional analy- sis methods struggle with enormous samples. In this paper, we propose four small-scale and easy-to-extract features, including sizes and permis- sions of PE sections, content complexity, and import libraries, to clas- sify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combi- nations. Compared with detailed behavior-related features like API se- quences, proposed features provide macroscopic information about mal- ware. The analysis is based on static disassembly scripts and hexadeci- mal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption. The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 di- mensions). In contrast, the novel proposed features together with a clas- sical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimen- sion (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining. Keywords: Malware Classification · Reverse Engineering · Machine Learn- ing · System Security. 1 INTRODUCTION Network and system security are incredibly critical issues at the moment. Ac- cording to [11], 142 million threats were being blocked every day in 2019. Fur- thermore, new types of malware are appearing all the time and are increasingly aggressive. For instance, the use of malicious PowerShell scripts increased by 1000% in the same year. To make matters worse, anti-anti-virus techniques used by attackers are also steadily improving. The use of polymorphic engines allows malware developers to mutate existing code while retaining the original func- tions unchanged. This is achieved, for example, through the use of obfuscation Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) 2 Zhenshuo Chen, Eoin Brophy, and Tomas Ward and encryption. Code obfuscation uses needlessly roundabout expressions and data to make source or machine code challenging for humans to understand. A simple way is to use roundabout expressions. Attackers can also use jump instruc- tions to make the real execution flow different from the disassembly script. Code encryption packs and encrypts executable files on the disk. They will decrypt themselves during execution. It means they are nearly impossible to analyze just by static disassembly relying instead on execution and reviewing of system logs. These anti-anti-virus techniques have now led to a rapid proliferation of mal- ware which traditional analysis methods struggle to cope with as these rely on signature matching and heuristic rules. They have the same drawback: unseen samples must be manually analyzed before creating signatures or heuristic rules. However, analysts cannot review each unknown file in practice. Machine learning approaches in contrast do not rely on understanding code and malicious behav- iors. After training with a wide range of known samples, such methods can more easily identify potential malware compared to human experts. Some automatic models have been applied in related fields, such as malware homology analysis by dynamic fingerprints in [12], and gray-scale image representation of malware in [7], which did not require disassembly or code execution. We adopt a machine learning approach in this work. The primary exploration and experiments of this paper are as follows: – The API n-gram, an efficient dynamic behavior feature usually generated by system event logging, is applied to static analysis. The result demonstrates that actual API sequences are hard to extract from disassembly scripts. Inaccurate n-grams have a substantial negative impact on classification. – A simpler variant of the import library feature is proposed, described in Sec- tion. 3.4. It uses One-Hot Encoding to indicate whether a library is imported by malware. Compared with the number of APIs imported by each library, this variant is easier to extract and more reliable. Because hiding import libraries is more difficult than hiding API calls. However, this feature does not provide as much detailed information as API calls. – Three new small-scale features are proposed: section sizes, section permis- sions, and content complexity. The feature descriptions are in Section. 3.5, Section. 3.6, and Section. 3.7 respectively. They can provide macroscopic information about malware. – Automatic machine learning is used to find the best model and hyper- parameters for each feature and its combination. – A method of using the classifier in practice is proposed. With the help of IDA Pro3 , the most popular reverse analysis tool, new training data can be generated from the latest known malware. It also provides a Python devel- opment kit and using this the classifier proposed here is implemented as an IDA Pro plug-in. This allows an analyst using IDA Pro to process a malware sample and perform classification immediately within their workflow. 3 https://hex-rays.com/products/ida/support/idadoc/index.shtml Malware Classification Using Static Disassembly and Machine Learning 3 2 RELATED WORK In this section, we primarily discuss malware analysis combined with machine learning and hand-designed static features. Since their processes can be ex- plained in terms of these underlying, interpretable features, analysts can un- dertake deeper exploration according to the model output. We also examine the small number of deep learning models which have been applied to the problem. These utilize image and byte representations to achieve their goals. In [12], Zheng Rongfeng et al. used strings, registry changes and API se- quences to distinguish whether a new sample was a variant of a known sample. One shortcoming of this method was that accurate registry changes and API se- quences must be recorded at execution, which requires costly virtual machines. And due to the lack of enforcement conditions, not all malicious behaviors can be recorded. Another serious issue is that malware can hide signatures through polymorphic engines and packers with the purpose of being harder for anti-virus software to detect. In [6], Maleki Nahid et al. proposed a binary classification system for packed samples. They unpacked files and extracted PE Headers [5], then used the forward selection method to pick seven features. Unlike the previ- ous, their model did not completely rely on detailed assembly instructions and system behaviors, but introduced macroscopic information, such as the number of executable sections and the debug information flag. However, it was unsuitable for files that could not be unpacked. In the face of these problems which are difficult to solve by fingerprinting, Nataraj et al. proposed an innovative method in [7]. They converted a malware sample’s byte content into a gray-scale image. The images belonging to the same family appear similar in layout and texture. They used K-Nearest Neighbors algorithm to determine whether the samples were derived from the same origin. Neither disassembly nor code execution was required, and simple transformations by polymorphic engines usually do not affect the general image layout. As a limitation, image representation has a problem that two malware images can be similar even if they belong to different families, because the same visual resources are used among these samples, like icons and user interface components. And in [2], Gibert et al. thought image representation might produce non-existing spatial correlations between pixels in different rows. In recent years, some deep learning models have also been applied in the field of malware classification. In [3], Kalash et al. combined Convolutional Neu- ral Networks with gray-scale image representation. Their model architecture was based on VGG-16 [10] and achieved 99.97% accuracy, which is the best result we have found to date. In [8], Raff et al. built a Convolutional Neural Network and used all bytes of a sample as raw data. But instead of training the network on raw bytes, they inserted an embedding layer to map each byte to a fixed-length feature vector. It could reduce incorrect correlations between two bytes. In other words, the certain bytes are closer to each other than other values, which is incorrect in terms of the assembly instruction context. And using Convolutional Neural Networks with a global max-pooling could increase the robustness when facing minor alterations in bytes. In contrast, traditional byte n-gram methods 4 Zhenshuo Chen, Eoin Brophy, and Tomas Ward are dependent on exact matches. For deep learning models with byte-based rep- resentation, Gibert et al. thought the main advantage of such an approach is that it can be applied to samples from different systems and hardware, because they are not affected by file formats [2]. However, the size of byte sequences is too large and the meaning of each byte is context-dependent. Byte-based repre- sentation does not contain this information. Another challenge is that adjacent bytes are not always correlated because of jumps and function calls. Apart from these models, Gibert et al. mentioned several challenges in the face of malware analysis [2]. One of them is Concept Drift. In many other machine learning applications like digit classification, the mapping learned from histori- cal data will be valid for new data in the future, and the relationship between input and output does not change over time. But for malware, due to function updates, code obfuscation and bug fixes, the similarity between previous and future versions will degrade slowly over time, decaying the detection accuracy. Furthermore, the interpretation of models and features should also be considered. When an incorrect classification happens, analysts need to understand why and know how to fix it. This is challenging without clear interpretability and explain- ability. Even in the absence of miss-classifications, analysts prefer to understand how a classification has been arrived at. This is the main reason why we did not choose a deep learning model. 3 FEATURE EXTRACTION The dataset used in the paper is from the 2015 Microsoft Malware Classifica- tion Challenge [9]. It contains 10868 malware samples representing a mix of nine families. Each sample has two files of different forms: machine code and dis- assembly script generated by IDA Pro. In practice, malware is in the form of executable files. However for safety reasons, Microsoft does not provide raw files but only processed machine code and disassembly scripts. Without executable files, dynamic analysis cannot be conducted. All features can only be extracted from static text. The features described in Section. 3.4, Section. 3.5, Section. 3.6 and Section. 3.7 are proposed by us. API n-grams in Section. 3.2 is an effective feature in dynamic behavior analysis. We tested it to check if it is also applicable for static disassembly analysis. 3.1 File Size The file size is the simplest feature, containing the sizes of disassembly and machine code files, and their ratios. File sizes vary according to the functional complexity of different malware families. And size ratios may represent code encryption. If a sample is encrypted, disassembly may fail and the ratio of its machine code size to the disassembly size will be different from the other samples. Malware Classification Using Static Disassembly and Machine Learning 5 3.2 API 4-gram The API sequence is almost the most commonly used feature. It directly uses ma- licious or suspicious API sequences to classify malware. Each malware family has distinct functions. For instance, Lollipop is Adware showing advertisements as users browse websites. Ramnit can disable the system firewall and steal sensitive information. These functions rely on different APIs. API sequences should be extracted by dynamic execution since it can reduce the negative impact of code encryption. However, because of the dataset limi- tation, we can only use regular expressions to match call and jmp instructions whose target is an import API from static disassembly scripts. It has a huge negative impact on accuracy and we will discuss this in detail in a later section. Finally, 402972 API 4-grams were extracted and only the 5000 most frequent items were retained. 3.3 Opcode 4-gram The opcode sequence is also commonly used. It focuses on disassembly instruc- tions. Opcodes are defined by CPU architectures, not by systems as in the case of APIs. So they are compatible with different systems built on the same archi- tecture. 1408515 opcode 4-grams were extracted and only the 5000 most frequent items were retained. 3.4 Import Library As mentioned in Section. 3.2, each malware family has distinct functions. They must import system or third-party libraries to achieve. So a typical machine learning feature is the number of APIs per imported library used by malware. But API numbers would be inaccurate if malware calls an API with dynamic methods such as by GetProcAddress. We proposed a simpler variant, using One-Hot Encoding to indicate whether a library is imported by malware. It is easier to extract and more reliable because in terms of system security, it is not as susceptible to anti-anti-virus techniques as the number of APIs. There are 570 different import libraries in the dataset. The 300 with the highest number of occurrences among them were retained. Fig. 2 demonstrates how this feature distinguishes Obfuscator.ACY from others and provides rough ideas about functionality. Crypt32 is a cryptographic library. Obfuscator.ACY may rely it on encryption. Most samples from other classes are not encrypted so they do not import this library. We also extracted top libraries based on Gini Impurity as in Fig. 1. 3.5 PE Section Size PE files consist of several sections. Each section stores different types of bytes and has attributes. The number of sections, their uses and attributes are defined by software development tools and programmers based on functionality. 6 Zhenshuo Chen, Eoin Brophy, and Tomas Ward Fig. 1. The most important libraries Fig. 2. A Decision Tree for Obfuscator.ACY This feature focuses on section sizes. Each section has two types of sizes: a virtual size and a raw size. They are VirtualSize and SizeOfRawData fields of structure IMAGE_SECTION_HEADER in PE Headers, respectively. The raw size is exactly the size of a section on the disk, and the virtual size is the size of a section when it has been loaded into memory. For instance, a section may store only uninitialized data whose values are only available after startup. There is no need to allocate space for it on the disk, so the raw size is zero, but the virtual size is not. The ratio of the two types of sizes is also included in the feature. The dataset contains 282 sections with different names. Each section has three attributes, so the full feature has 846 dimensions. After feature selection using Random Forest based on Gini Impurity, only the 25 most essential dimensions were retained. Most of them are standard sections defined by software development tools as in Fig. 3. 3.6 PE Section Permission PE sections have access permissions, which are combinations of readable, writable and executable. We calculated the total size of readable data, writable data and executable code separately for each malware sample. Like the previous, each permission has three attributes: a virtual size, a raw size and a ratio of the two sizes. This feature can be regarded as a summary of PE section sizes and pro- vides a more general view with only nine fixed dimensions. For example, Fig. 4 shows the distribution of writable virtual sizes. Four Backdoor classes (3, 5, 7, and 9) have relatively large writable space. This is possibly because they need to steal sensitive information or download other files from the Internet to run, which require enough memory space. Additionally, we think these two PE section features (sizes and permissions) have compatibility with Linux systems. Linux uses the Executable and Linkable Format (ELF) for executable files. It has similar section structures to the PE format. 3.7 Content Complexity Content complexity is a new feature type for malware classification. What we propose here has six fixed dimensions: the original sizes, compressed sizes and Malware Classification Using Static Disassembly and Machine Learning 7 Fig. 3. The most important sections Fig. 4. The distribution of writable vir- tual sizes compression ratios of disassembly and machine code files. We used Python’s zlib library to compress samples and recorded size changes. This approximates function complexity, code encryption and obfuscation. Fig. 5 is from the sample with the largest disassembly compression ratio of 12.8. It might be obfuscated with repetitive, roundabout instructions. In contrast, Fig. 6 has the smallest disassembly compression ratio of 2.3. The disassembly failed and IDA Pro can only output its original machine code. This is because the sample is encrypted and packed by UPX, a famous open-source packer for executable files. In addition to this, the use of complex, rare instructions can also lead to low compression ratios. Fig. 5. Snippet with the largest com- Fig. 6. Snippet with the smallest com- pression ratio pression ratio In theory, this feature can be used directly in the malware classification of any other CPU architecture and system. It has better compatibility than others because it does not rely on any platform-related characteristics and structures. However, CPUs have different instruction sets. For instance, Intel x86 is based on Complex Instruction Set Computing, while ARM is based on Reduced In- struction Set Computing. It may affect classification accuracy. 4 EXPERIMENTS For each feature and its combination, we used automatic machine learning li- brary auto-sklearn to search for the best parameters, relying on Bayesian op- timization, meta-learning and ensemble construction [1]. 80% of the dataset was used as a training set and auto-sklearn evaluated models on it using 5-fold 8 Zhenshuo Chen, Eoin Brophy, and Tomas Ward cross-validation. The models include K-Nearest Neighbors, Support Vector Ma- chine and Random Forest. All experiments were conducted on 64-bit Ubuntu, Intel(R) Core(TM) i7-6700 CPU (3.40GHz) with 12GB RAM. Each model’s parameter search process lasted up to one hour. After auto-sklearn had deter- mined a model’s optimal parameters, we used the remaining 20% as a test set to calculate classification accuracy. The results are shown in Table. 1, sorted in increasing order of accuracy. Random Forest provided the best performance in all experiments. Table 1. The feature accuracy Feature(s) Dimension Best Accuracy All Features 1812921 → 10343 0.9948 Section Size, Section Permission, Content Complexity 861 → 40 0.9940 Section Size, Section Permission, Content Complexity, Import Library 1431 → 340 0.9922 Opcode 4-gram 1408515 → 5000 0.9908 File Size, API 4-gram, Opcode 4-gram 1811490 → 10003 0.9899 Content Complexity 6 0.9811 Section Size 846 → 25 0.9775 Section Permission 9 0.9701 Import Library 570 → 300 0.9393 File Size 3 0.9352 API 4-gram 402972 → 5000 0.5796 Among individual features, opcode 4-grams provided the highest accuracy of 99.08%, meaning static disassembly does not invoke many negative impacts on opcode 4-grams. They are effective both in dynamic and static analysis, but their extraction requires much time and computational resources. The original dimen- sion of opcode 4-grams before feature selection is the largest (1408515). Content complexity, PE section sizes and PE section permissions achieved 98.11%, 97.75% and 97.01% accuracy respectively, which are satisfactory considering they are low dimensional representations. Import libraries did not perform very well, but the prediction paths generated by a Decision Tree of import libraries can provide functionality comparisons between malware families, like Fig. 2. Other features except API 4-grams cannot do this. At the beginning, we expected that the API sequences would be an effective feature in static disassembly analysis, as it does in dynamic behavior analysis. Unexpectedly, the API 4-gram is the worst. Its accuracy is only 57.96% and involves a 5000 dimensional feature vector represen- tation. Our result shows that API sequences may only be applicable to dynamic behavior analysis. The data errors caused by static disassembly extraction have a very negative effect on feature validity. Among integrated features, the combination of PE section sizes, PE section permissions and content complexity is almost the best with 99.40% accuracy and 40 dimensions. If all features were used, the accuracy was only improved by Malware Classification Using Static Disassembly and Machine Learning 9 0.08%, while the number of dimensions increased dramatically to 10343. Addi- tionally, the highest accuracy of 99.48% we achieved is still lower than 99.97% in [10]. If interpretability is not considered, the combination of Convolutional Neu- ral Networks and gray-scale images they used is obviously an excellent model. 5 LIMITATIONS OF STATIC DISASSEMBLY The dataset contains only static text. In general, it negatively affects classifi- cation accuracy. We identified three specific problems. They mainly affect the extraction of API sequences and import libraries, and do not have serious im- pacts on the three new features we proposed. 5.1 Lazy Loading In the process of extracting import libraries, only the libraries in the Import Table can be extracted, which is a structure in PE Headers used to import external APIs. These libraries will be automatically loaded when malware starts. In order to make malicious behavior more hidden, developers can use lazy loading to load a library just before it is about to be used. Lazily loaded libraries cannot be extracted from static disassembly scripts. As shown in Fig. 1, top libraries are ubiquitous and have no special significance for malware classification. A reasonable speculation is that sensitive libraries are lazily loaded and PE Headers only contain regular libraries. 5.2 Name Mangling Compared with import libraries, the API sequence is more negatively affected. We found two reasons. The first is Name Mangling. It allows different program- ming entities to be named with the same identifier, like C++ overloading. Com- pilers can select the appropriate function based on parameters. It is convenient for programmers. Internally, compilers need different identifiers to distinguish them. Name mangling adds noise to the API n-gram extraction. For the same or similar functions, we may extract more than one name. A theoretical solution is to convert mangled names back to the same original name. However, in practice, it is challenging to develop converters for every possible compiler and language. Moreover, some compilers do not disclose their detailed name mangling mecha- nism. 5.3 Jump Thunk Jump Thunk is the second reason for the poor performance of API sequences. Many compilers generate a jump thunk, a small code snippet, for each external API, then convert all calls to the API into calls to its jump thunk. This mech- anism can provide an interface proxy. But it makes API sequences inaccurate when we used linear scanning to extract external API calls. Theoretically, we can recognize jump thunks and match them to external APIs. But thunks’ names are random and their contents may be more complex than jump instructions. 10 Zhenshuo Chen, Eoin Brophy, and Tomas Ward 6 PRACTICAL APPLICATION As discussed in [2], the similarity between previous and future malware will de- grade over time due to function updates and polymorphic techniques. Polymor- phic techniques can automatically and frequently change identifiable characteris- tics like encryption types and code distribution to make malware unrecognizable to anti-virus detection. To solve this, we designed an automatic malware classi- fication workflow to apply and enhance our classifier in practice with IDA Pro’s Python development kit, as shown in Fig. 7. The source code is available on the GitHub4 and makes available as practical features the following contributions 1. Data Generation In general, analysts can only collect raw executable malware, not disassembly scripts like those provided in the dataset. To generate new training data, we developed an IDA Pro script that can be run from the command line with IDA Pro’s parameters -A and -S, which launch IDA Pro in autonomous mode and make it run a script. For each executable file, it produces disas- sembly instructions and hexadecimal machine code, relying on IDA Pro’s disassembler. These two output files are in the same format as the files used for training in the dataset. 2. Automatic Classification We used another automatic machine learning library TPOT to search for the best model for the feature combination of PE section sizes, PE sec- tion permissions and content complexity. We think this combination main- tains a good balance between accuracy and the number of dimensions. TPOT achieved 99.26% accuracy, slightly lower than auto-sklearn (99.40%). Un- like auto-sklearn, TPOT uses Genetic Programming to optimize models [4]. Once the search is complete, it will provide Python code for the best pipeline. auto-sklearn does not have a similar function. With the fitted model, we developed an IDA Pro classifier plug-in. When an analyst opens a malware sample with IDA Pro, the plug-in will produce the required raw input files before features are calculated and the classification performed as in Fig. 8. 3. Manual Classification Although automatic classification is very useful, the result may be inaccurate or in doubt therefore the plugin provides a simple means for analysts to perform in-depth analysis manually to determine a sample’s exact family. 4. Model Training With sufficient output files and labels of the latest samples, the classifier can be retrained and strengthened either manually or in an automated fashion. Our model was trained by these nine malware families only, so if an in- put sample does not belong to them, the model will get an incorrect result or classify the sample into the family that is most similar to its actual type. But theoretically, these features are applicable to more families if more datasets are available. 4 https://github.com/czs108/Microsoft-Malware-Classification Malware Classification Using Static Disassembly and Machine Learning 11 Fig. 7. The automatic malware classi- Fig. 8. The IDA Pro classifier plug-in fication workflow 7 CONCLUSION AND FUTURE WORK This paper demonstrates how novel, highly discriminative features of relatively low dimensionality when combined with automatic machine learning approaches can provide highly competitive classification accuracy for malware classification. Compared with traditional manual analysis, machine learning can provide a fast and accurate classifier after training on the latest malware samples. It does not rely on an understanding of code. Unlike API and opcode n-grams, which aim to match specific malicious operations, our features focus more on macroscopic information about malware. In theory, these features are more compatible with multiple operating systems and not susceptible to code encryption. One short- coming is that they cannot offer detailed understanding of malicious behaviors like API sequences. Analysts must combine multiple features in order to per- form more in-depth analysis. In addition, the negative limitations and effects of static text are more severe than we thought, especially for API n-grams. It is challenging to extract exact API sequences from disassembly scripts with linear scanning. We conclude with a number of open avenues for research that might reduce the negative effects of static disassembly and improve machine learning models for malware processing: – Remove regular libraries from the import library feature. Machine learning models are forced to use only sensitive libraries to classify samples. Note a potential problem here is that only a tiny number of sensitive libraries may be extracted. – Although many C/C++ compilers exist, there are not many commonly used versions. We can consider developing name demangling for common compil- ers and renaming APIs using our defined convention. – The core of a disassembly script is assembly instructions. So assemblers may be helpful to perform code analysis to determine the correspondence between APIs and jump thunks. Bibliography [1] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hut- ter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2962–2970. Curran Asso- ciates, Inc. (2015), http://papers.nips.cc/paper/5872-efficient-and-robust- automated-machine-learning.pdf [2] Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for de- tection and classification of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153, 102526 (2020). https://doi.org/https://doi.org/10.1016/j.jnca.2019.102526, https: //www.sciencedirect.com/science/article/pii/S1084804519303868 [3] Kalash, M., Rochan, M., Mohammed, N., Bruce, N.D.B., Wang, Y., Iqbal, F.: Malware classification with deep convolutional neural networks. In: 2018 9th IFIP International Conference on New Technologies, Mobility and Secu- rity (NTMS). pp. 1–5 (2018). https://doi.org/10.1109/NTMS.2018.8328749 [4] Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learn- ing to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020) [5] Microsoft Corporation: PE format. Available at https://docs.microsoft. com/en-us/windows/win32/debug/pe-format (2021/05/10) (2021) [6] Nahid, M., Mehdi, B., Hamid, R.: An improved method for packed mal- ware detection using PE header and section table information. International Journal of Computer Network and Information Security 11, 9–17 (09 2019). https://doi.org/10.5815/ijcnis.2019.09.02 [7] Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: Visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Secu- rity. VizSec ’11, Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/2016904.2016908, https://doi.org/10. 1145/2016904.2016908 [8] Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware detection by eating a whole EXE (2017) [9] Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge. ArXiv abs/1802.10135 (2018) [10] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large- scale image recognition. arXiv preprint arXiv:1409.1556 (2014) [11] Symantec Corporation: Internet security threat report. Tech. rep., Symantec Corporation (2019) [12] Zheng, R., Fang, Y., Liu, L.: Homology analysis of malicious code based on dynamic-behavior fingerprint (in Chinese). Journal of Sichuan University (Natural Science Edition) 53(004), 793–798 (2016)