=Paper=
{{Paper
|id=Vol-2009/fmt-proceedings-2017-paper15
|storemode=property
|title=A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS
|pdfUrl=https://ceur-ws.org/Vol-2009/fmt-proceedings-2017-paper15.pdf
|volume=Vol-2009
|authors=Niklas Thür,Markus Wagner,Johannes Schick,Christina Niederer,Jürgen Eckel,Robert Luh,Wolfgang Aigner
|dblpUrl=https://dblp.org/rec/conf/fmt/Thur0SNELA17
}}
==A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS==
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS Niklas Thür1 , Markus Wagner1 , Johannes Schick1 , Christina Niederer1 , Jürgen Eckel3 , Robert Luh2 , Wolfgang Aigner1 1 Institute of Creative\Media/Technologies, St. Pölten University of Applied Sciences, Austria 2 Josef Ressel Center for Unified Threat Intelligence on Targeted Attacks, Austria 3 IKARUS Security Software GmbH, Austria Email: 1,2 first.last@fhstp.ac.at, 3 eckel.j@ikarus.at Abstract—Malicious software, short “malware”, refers to soft- et al. [3] a design study for a behavior-based knowledge- ware programs that are designed to cause damage or to perform assisted malware analysis system (referred to as KAMAS) unwanted actions on the infected computer system. Behavior- is described. The malware analyst’s workflow involves the based analysis of malware typically utilizes tools that produce lengthy traces of observed events, which have to be analyzed tasks of examining potentially malicious behavior patterns, manually or by means of individual scripts. Due to the growing selecting them, categorizing them, and storing the found rules amount of data extracted from malware samples, analysts are in the knowledge database (KDB) [3]. We developed an in need of an interactive tool that supports them in their interactive prototype to extend the KAMAS design study [3] exploration efforts. In this respect, the use of visual analytics with a new feature of Bi-Gram supported Generic Knowledge- methods and stored expert knowledge helps the user to speed up the exploration process and, furthermore, to improve the Assisted Malware Analysis System (BiG2-KAMAS) [4]. A quality of the outcome. In this paper, the previously developed focus group meeting with members of an Austrian IT security KAMAS prototype is extended with additional features such as company, the Information security department of St. Pölten the integration of a bi-gram based valuation approach to cover UAS and the developers of the initial KAMAS prototype further malware analysts’ needs. The result is a new prototype was conducted to identify the tasks and needs for additional which was evaluated by two domain experts in a detailed user study. features requested by the IT security company to extend the KAMAS design study [3]. Based on this feature list, the paper I. I NTRODUCTION at hand contributes the following: 1) Integrating a generic data loading process enabling Malicious software, or short malware, is one of the biggest KAMAS to load any kind of data, based on a given threats to computer systems these days [1]. ’Malware’ refers structure; to software programs, which are designed to cause damage or 2) Storing benign rules and their highlighting when loading perform other unwanted actions on a computer or network. new cluster files, thereby supporting the analyst; Therefore malware plays a big part in most computer in- 3) Identifying malicious or benign call sequences by in- trusions and security incidents. Malware includes inter alia: cluding a bi-gram based valuation; viruses, trojan horses, worms, rootkits, scareware, and spy- 4) Presenting in detail two user studies validating the new ware [1]. By now there are millions of malicious programs features. and the number is increasing every day. This paper is structured as follows: Sect. II provides back- “Malware analysis is the art of dissecting malware to ground knowledge about the work of our collaborators and understand how it works, how to identify it, and how to related work in the field of malware analysis. In Sect. III we defeat or eliminate it” [1]. In malware analysis, there are two describe the prototype’s design, visualization methods and im- basic approaches to examine a malware program: the static plementation. Furthermore, Sect. IV defines the integration of and the dynamic approach. Often the malware analyst only additional knowledge in the prototype’s knowledge database. has the potentially malicious executable, which includes the Sect. V shows the prototype’s evaluation method, while results machine code but is not human-readable. Therefore, static are discussed in Sect. VI. malware analysis involves the investigation of the malware executable as well as certain reverse-engineering tasks to II. R ELATED W ORK recover the sample’s source code. On the other hand, dynamic Shiravi et al. [5] published a survey related to network se- analysis requires the execution of the malicious software on curity visualization, comparing the data sources and visualiza- e.g. a virtualized host machine to detect the malware’s run- tion techniques of thirty-eight different systems. Furthermore, time behavior [1]. To cover all of the malware analyst’s Egele et al. [6] presented a general literature for malware needs, Wagner et al. [2] performed a problem characterization analysis techniques and tools. In their work they surveyed and abstraction elaborating the analysts needs in relation to different approaches for dynamic automated malware analy- behavior-based malware analysis. In the article by Wagner sis and compared them based on their analysis techniques. 107 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS Fig. 1. The BiG2-KAMAS prototype and it’s three sections: Section 1 shows the knowledge base including the KDB (1a) with it’s new category for benign activity. Beneath the knowledge base highlighting filters are displayed (1b). Section 2 shows the rule exploration area including the bigram visualization (2b) and new color highlighting for benign rules (2a). Finally, section 3 shows the call exploration area. Likewise, Bazrafshan et al. [7] surveyed various heuristic and classification” [16]. Wrench and Irwin [17] published an malware detection techniques as well as malware obfuscation approach in which they identify and classify Remote Access techniques. Additionally, Wagner et al. [8] published a survey Trojans (RATs) and other malicious software based on the of 25 different visualization systems for malware analysis. The programming language PHP. objective of their work was the comparison and categorization of the malware systems visualization methods and features and III. P ROTOTYPE C ONCEPT categorizing them along their novel ’Malware Visualization This section describes the new features of the ‘Bi-Gram Taxonomy’. Furthermore, McNabb and Laramee [9] published supported Generic Knowledge-Assisted Malware Analysis a survey of surveys: Mapping The Landscape of Survey Papers System (BiG2-KAMAS), conceptually grounded on the KA- in Information Visualization. MAS prototype [3]. In 2017, Wagner et al. [3] published a paper on a A. Data Knowledge-Assisted Malware Analysis System, referred to as In its current iteration, BiG2-KAMAS bases its visualization KAMAS. In their user study, they found out that the experts on sequential traces of Windows kernel operations amounting are not only interested in visualizing patterns. A supportive to benign and malicious application behavior in the context valuation approach was implemented by Luh et al. [10], [11], of OS and user-initiated processes. These events are typically calculating the degree of maliciousness based on system and abstractions of raw system and API calls that yield information API call bi-grams. Somarriba et al. [12] presented another about the general behavior of an unknown application sam- malware detector system for Android Malware Behavior. Be- ple or resident process [8]. Raw calls may include wrapper sides, Marschalek et al. [13] published a system for threat functions (e.g. CreateFile) that offer a simple interface detection using a real-time monitoring agent to gather all or to the application programmer, or native system calls (e.g. only selected system events and visualize these using event NtCreateFile) that represent the underlying OS or kernel propagation trees. Xiaofang et al. [14] published a paper of support functions. In the context of BiG-KAMAS and its data a malware variant detection approach using Similarity Search providers, events are collected directly from the Windows “by processing malware as content fingerprint” [14]. Jain et kernel. We employ a driver-based monitoring agent [13] al. [15] presented a visual exploration approach of android designed to collect and forward a number of events to a binary files. Their approach is based on the visualization of database server. This gives us unimpeded access to events android .dex files to analyze and compare malicious android depicting operations related to process and thread control, executables. David et al. [16] presented “a novel deep learning image loads, file management, registry modification, network based method for automatic malware signature generation socket interaction, and more. For example, a shell event that 108 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS creates a new binary file on a system may be simply denoted as TABLE I a triple explorer.exe,file-create,sample.exe. O PERATION OF S EQUITUR AFTER [20]. P ROPERTY APPLICATION IS italicized. Additional information captured in the background includes various process and thread ID information required to uniquely Symbol String Grammar Remarks identify an event within a system session and to link individual 1 a S→a events to a full sequence (trace) needed for further processing 2 ab S → ab stages. Based on aforementioned traces, BiG2-KAMAS uses 3 abc S → abc two distinct mechanisms to further process arbitrary kernel 4 abcd S → abcd 5 abcdb S → abcdb event sequences: 6 abcdbc S → abcdbc bc appears 2x Pattern inference: Our introduced framework has been S → aAdA bigram uniqness developed in concert with an event extraction system called A → bc 7 abcdbca S → aAdAa SEQUIN [11]. SEQUIN uses grammar inference extended A → bc with statistical evaluation to automatically identify and crop 8 abcdbcab S → aAdAab relevant sequences (rules) from traces of kernel-level behav- A → bc 9 abcdbcabc S → aAdAabc bc reappears ioral data for further processing and visualization. Generally A → bc speaking, grammar inference is the process of computationally S → aAdAaA bigram uniqness assembling a formal ruleset by examining the sentences of an A → bc aA appears 2x S → BdAB bigram uniqness unknown language [18]. In the information security domain, A → bc grammar inference is primarily used for pattern recognition, B → aA computational biology, natural language processing, language 10 abcdbcabcd S → BdABd Bd appears 2x A → bc design programming, data mining, and machine learning. B → aA Grammar inference has also been proven to be a feasible S → CAC bigram uniqness approach to anomaly detection, since “algorithmic incompress- A → bc B used only 1x B → aA ibility is a necessary and sufficient condition for randomness” C → Bd [19]. We use grammar inference as key component in the S → CAC rule utility process of ‘compressing’ a sequential trace for extracting A → bc C → aAd relevant behavioral patterns. To achieve inference by compression in a computationally feasible way, we selected an algorithm that losslessly produces (without changes to order and immutability) a context-free [10]. An LLR test is a statistical method used test model grammar (CFG) in unsupervised operation. As opposed to assumptions, namely the quality of fit of a reference (null) context-sensitive grammars, languages created by a CFG can and an alternative model. When determining the occurrence be recognized in O(n3 ) time, which is a relevant distinction for of rarely observed events – which are often at the core of all future parsing efforts. The choice ultimately fell on Sequitur malicious traces – likelihood ratio tests show significantly [20]. Sequitur is a greedy compression algorithm that creates better results than alternatives such as x2 or z-score tests [21]. a hierarchical structure (CFG) from a sequence of discrete In preparation for sentiment-assisted visualization, we use symbols by recursively replacing repeated phrases with a the LLR method to learn likely benign and malicious event se- grammatical rule. The output is a compressed representation of quences in big corpora of recorded kernel operations (traces). the original sequence. The algorithm creates this representation The resulting sentiment dictionary can be used to accurately through the application of two base properties: rule utility and and effectively determine if an investigated event bi-gram is bi-gram uniqueness. Rule utility checks if a rule occurs at least contextually suspicious. Specifically, we compute the LLR twice in the grammar, while bi-gram uniqueness observes if score for each bi-gram to highlight collocations characteristic two adjacent symbols occur only once. Assuming we have to sequences of malicious and benign system events [10]. a string abcdbcabcd, where every character represents an The resulting occurrence counts (shown in Table II) are event, the first bi-gram of that trace would be ab, followed by the basis for this calculation: Following the approach by a second bi-gram bc, and so forth. See Table I for a complete [21], we define the number of times both event tokens occur example of the process. in combination (k11 ), the number of times each token has Sequitur is linear in space and time. In terms of data been observed independently from the other (k12 and k21 , compression, the algorithm can outperform other designs depending on the relative position in the bi-gram), and the that achieve data reduction by factoring out repetition. It is number of times the token was not present at all (k22 ). almost as performant as designs that compress data based on probabilistic predictions [20]. TABLE II Bi-gram extraction and scoring: In addition to rule infer- E VENT OCCURRENCE MATRIX [10] ence, BiG2-KAMAS uses precomputed maliciousness scores A !A of event bi-grams separately explored using a sentiment-like B k11 =k(AB) k12 =k(!AB) extraction system based on the log likelihood ratio (LLR) test !B k21 =k(A!B) k22 =k(!A!B) 109 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS The same process is later applied to the pattern’s general The background of the third column of the ‘Rule Overview occurrence in a labeled benign versus malicious corpus. The Table’ indicates whether a rule is fully benign, partially final result is a normalized sentiment rating ranging from benign, not known, partially malicious or fully malicious. +1.0 (benign) to −1.0 (malicious). Unknown bi-grams are The background of the malicious rules will be painted in red ultimately scored against the resulting dictionary, the outcome and the background of the benign rules in blue. The fully of which is at the core of the bi-gram evaluation feature in the known rules will be displayed in a dark red/blue while the new BiG2-KAMAS prototype. partially known rules are highlighted in a light red/blue (see Figure 1:1b). The red color highlighting for malicious activity B. Visualization Design is adopted of the KAMAS prototype [3]. If a rule is fully Structure: Wagner et al. [3] describe in their article that known and, therefore, highlighted in dark red, the rule is since IT-security experts are commonly familiar with pro- included as-is in the KDB. A partially known rule is only a gramming IDEs, they used the design concept of IDEs like part of one rule in the KDB. This kind of rule has at least one Eclipse or Netbeans for their prototype. The updates to the new additional call at the beginning or at the end of a fully known prototype also follow this design concept approach. In contrast rule [3]. If an input file was loaded, the system automatically to the previous prototype, the new one has an additional view. calculates the knowledge state of each rule. For this purpose, In this initial view the KDB is situated on the left side, which the system compares each rule of the input file with each can be compared to the project view in Eclipse. On the right rule of the KDB. After the calculation process the system side only the file load buttons are displayed, which can be highlights the rules in the corresponding colors in the rule compared to the initial view of Eclipse, where no project has overview table. been opened yet. Bi-Gram Visualization: The rule detail table is located Coloring: For the rule highlighting as well as the Bi-Gram next to the rule overview table (see Figure 1:2b). The rule visualization we selected a sequential color scheme from red detail table automatically updates its content when clicking to blue. Red indicates that the rule or bi-gram is malicious on a rule in the rule overview table and represents all system and a blue one stands for a benign rule or bi-gram. To avoid and API calls included in the selected rule. From left to right, problems with red and green hues for colorblind people [22, p. the table displays the unique id as well as the name of the call. 124], we used blue instead of green and select colorblind-safe The last column visualizes the new bi-gram based valuation qualitative colors from Colorbrewer1 . approach for the corresponding calls. As mentioned before, Layout: The prototype is structured into three parts: knowl- the prototype uses the bi-gram approach of Luh et al. [10]. edge base, rule exploration area and call exploration area (see A bi-gram is an n-gram where the length of n = 2. An Figure 1). On the left side the knowledge base is visualized n-gram, in turn, is a coherent sequence of n elements. In with it’s ‘Knowledge Database (KDB)’ (see Figure 1:1a) and this approach the elements are system or API calls. Each the KDB’s color highlighting filters (see Figure 1:1b). The bi-gram has a score in the range [-1, 1], which indicates KDB is displayed as a tree, in which each category of the whether this pair of calls is malicious or benign. For bi- database can have several subcategories. Each category with gram based valuation, two different visualization approaches subcategories is shown with a box icon (see Figure 1:1a) were implemented following a semantic zooming approach: and the ones without subcategories are displayed with folder First, if the width of the bi-gram column is bigger than 75px, icons. Each rule, which is stored in the database, is displayed the prototype visualizes the bi-gram values as bar charts (see with a paper icon. Beneath the KDB the ‘Knowledge Base Figure 2:a), whereby each bar starts in the middle of the bi- Highlighting’ filters are displayed (see Figure 1:1b). Each filter gram column. If the bi-gram score is between 0 and -1, the can be activated or disabled with its checkbox and updates the bi-gram is malicious. Therefore, the red color bar chart unfurls result of the prototypes filter pipeline and visualization of the from the middle towards the left side. If the bi-gram score is ‘Rule Overview Table’ (see Figure 1:2a). between 0 and 1 the bi-gram is benign and the bar chart is After loading and translating the input file, the system visualized from the middle to the right side in a blue color. The updates the ‘Graphical User Interface’ (GUI) and visualizes colors correspond to the KDB highlighting. The visualization new elements. In the middle the ‘Rule Exploration’ area (see approach was chosen to give the user a quick but still precise Figure 1:2) is visualized, while the right side contains the ‘Call overview of the bi-gram based scores. Exploration’ area (see Figure 1:3). If the width of the bi-gram column is smaller than 75px and In the ‘Call Exploration’ area all the included system or API therefore the bar charts are hardly recognizable, the system calls of the loaded input file are represented in the call table switches to the second visualization. Here, the bi-gram values (see Figure 1:2b) as described by Wagner et al. [3]. The rules are visualized as a color-filled rectangle (see Figure 2:b). included in the input file are visualized in the rule overview As before, a red colored rectangle indicates that the bi-gram table located in the ‘Rule Exploration’ area (see Figure 1:2a). is malicious and a blue one stands for a benign bi-gram. If the user loads several trace files, each trace file will be To visualize the value of the malicious or benign bi-gram, displayed as one rule. the system changes the alpha value of the displayed color. Therefore, the darker the color, the higher the respective value. 1 http://colorbrewer2.org Since the difference of an alpha value between 255 and 240 is 110 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS data of these files. Contrary to a loaded Sequitur file, each entry of the rule overview table represents an entire trace file. Thus, if the user loads three traces the rule overview table will have only three rows. Furthermore, due to the fact that the user analyses several independent trace files the histogram for the rule occurrence is insignificant. Therefore, only one histogram for the trace length will be displayed in the rule filter area. Rearrange: If the rule overview table and the call overview table are loaded with data, the user can rearrange their content by clicking on a table’s column. This will re-sort the included data and update the visualization [3]. The content of the rule detail table cannot be rearranged since the calls are shown in their sequential order and should therefore not be changeable. Fig. 2. The two different visualisations methods of the call bi-grams. The Filter: In the next step the user can reduce the number of first method visualises the bi-grams as bar charts (a), whereas the second rules or trace files by using the rule/trace and call filters [3]. visualisation uses the alpha channel to show the severity of the bi-gram (b). No matter which files were loaded, the user always has the opportunity to filter the rules or traces by the included calls not easy to recognize and every value below 100 is generally (events). The user can rearrange the call filters or select a difficult to see, we decided to implement only four graduation specific call in the call overview table to reduce the number of steps for the alpha value. The visualization with the alpha shown rules [3]. Furthermore, the analyst can filter the rules or value is less precise than the visualization with the bar charts specific traces by using the filters in the rule exploration area. but, at the same time, significantly easier to interpret. Table III If loading a Sequitur file, the analyst can filter the rules by their shows the different graduation steps and their value ranges. occurrence, length, whether they are equally distributed in the input file or if they match, partially match, or don’t match the TABLE III stored rules in the KDB [3]. By changing the filter settings, the C OLOUR GRADUATION STEPS FOR THE ALPHA VALUE BI - GRAM included rules in the rule overview table automatically update VISUALISATION . immediately. If one or more trace files were loaded, the analyst Colour Alpha value Value ranges can only filter the shown traces in the rule overview table by 200 >= 0.75 their length. In addition, the highlighting and filtering of the KDB is switched off. 150 >= 0.5 && <0.75 Details-on-Demand: If the user wants to analyze a rule or 100 >= 0.25 && <0.5 trace, he/she can open the rule/trace in the rule detail table 50 >= 0 && <0.25 by selecting it in the rule overview table. This will display all the included calls in the rule detail table in their sequential 50 <0 && <= -0.25 order [3]. The bi-grams provide information whether a combi- 100 <-0.25 && <= -0.5 nation of two calls is malicious or benign. This should support 150 <-0.5 && <= -0.75 the user in finding interesting call sequences more quickly. 200 <-0.75 Extract: Independent of the loaded files the analyst can add a new rule to the database using two different ways. One method is to simply select one rule or trace in the rule C. Interaction overview table and simply drag and drop it in one leaf category Like the KAMAS prototype of Wagner et al. [3], the BiG2- of the KDB. This will add the entire rule or trace file to the KAMAS’s functionality will be described in accordance to database [3]. Alternatively, the analyst can select several calls the four steps of the visual information seeking mantra of of interest in the call overview table and add these by dragging Shneiderman et al. [23], namely overview, rearrange and filter, and dropping them to the KDB. When adding a new rule to details-on-demand and, extract. the KDB, a popup window will show up where the analyst Overview: The BiG2-Kamas prototype has an additional can assign the rule a specific name. If the user has loaded a initial view where the user can decide whether to load a Sequitur file, the system will now update the knowledge state Sequitur input file or several raw trace files. When the analyst for all rules as well as the highlighting in the rule overview loads a Sequitur file, the rule and call tables will be filled with table for further analysis. the rule and call data included in the input file. Each entry in the rule overview table represents one rule of the loaded D. Implementation cluster. Furthermore, the histograms in the rule exploration Since the BiG2-KAMAS prototype is based on the proto- area give a quick impression of the distribution in the rule type of Wagner et al. [3], it also uses a data-oriented design occurrence and length [3]. When the user loads one or more concept [24]. To increase the performance of the prototype, trace files the rule and call tables will also be filled with the the system only works with integer comparisons. Therefore, 111 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS the input data only includes the call ids. It is only possible to •Rule Name: Here, the actual rule name is displayed. The translate a call id to the actual call value with an additional rule name is implemented as a text field to quickly change translation file. This translation file is also used for the bi- it if necessary. grams. The original bi-gram file has several columns in which • Included Calls: Finally, the calls included in the stored only the string values of the system or API calls are stored. rule are displayed in a table. Thus, the calls are visualized To increase the performance and to reduce memory usage, the in their sequential order and each call will be shown with BiG2-KAMAS prototype generates its own bi-gram file. When its unique call id which corresponds to the call id of the starting the prototype the system checks with md5 hash values translation file and the actual call value. In the current to determine whether the translation file or the original bi-gram version of the prototype it is only possible to investigate file has changed. If so, the system converts the original bi-gram the included calls in their sequential order, but not to file to the translated bi-gram file in which also the integer delete specific calls which are listed in the table. values of the system calls are stored. Like the prototype of The second menu item is the “delete’ item, which allows Wagner et al. [3] the new prototype is using the action pipeline the analyst to delete the currently selected rule. Furthermore, for filter options. This enables dynamic query environments when selecting a concept instead of a rule, the BiG2-Kamas and real-time data operations. prototype will show a context menu with which the user To evaluate the robustness and performance of the BiG2- can disable a category and all its integrated subcategories. KAMAS prototype three different Sequitur cluster-grammar Thus, the analyst can disable the entire KDB or only specific files containing between 10 and 500 rules were used. The file categories. If the user disables a category all the included with 500 different rules contained a total amount of 30,000 rules will no longer be considered in the knowledge base system and API calls. To test the bi-gram functionality, a bi- highlighting and filtering. gram file with nearly 117,500 bi-gram entries was loaded. On When the user clicks the right mouse button to open a machine with an 2.1GHZ Dual-Core processor and 12GB the corresponding context menu before selecting a rule or of memory it took the system about four minutes to translate category, the system automatically selects the rule/category at the original bi-gram file to the translated bi-gram file. The the actual mouse position. malware and bi-gram samples were collected by collaborators Searching: If the user searches for interesting rules or in the Josef Ressel Center TARGET of St. Pölten UAS. specific calls or call groups he/she can use the call filter options to reduce the data to be analyzed. In the call exploration area, IV. E XTERNALIZED K NOWLEDGE I NTEGRATION the user can search for a specific call by entering its name or use regular expressions to find an entire call group. Beneath As Wagner et al., [3] described in their article, we integrated the search text field the user can enable case sensitive search a knowledge database to support the user during their analysis with the corresponding checkbox ’Case Sensitive’. Filtering or tasks. The KDB is based on the malware behavior schema of searching the calls affects the data shown in the call overview Dornhackl et al., [25]. The KDB is located at the left side of and rule overview table. Additionally, to find rules of interest the prototype and is implemented in a hierarchical structure the analyst can use the rule exploration filters or the knowledge (tree structure). In the BiG2-KAMAS prototype the KDB was base filters. extended by one additional category to store the benign rule data, namely benign activity. In the current version of the V. P ROTOTYPE E VALUATION prototype there is only one category to store benign rule data. This section describes the procedure of the performed user Each category is displayed with either a box or a folder icon, studies, the specific results, as well as further feature requests. the category description and the number of included rules in For the prototype validation, a user study with two domain the integrated subfolders. The analyst can add new rules by experts was conducted. The domain experts validated the drag & drop. When adding a new rule, the KDB automatically functionality as well as the visual design interface. unfolds closed categories. Additionally, a popup window opens Participants: Both participants work at St. Pölten UAS and in which the analyst can enter a rule name. To investigate a have more than five years of experience in the field of malware rule stored in the KDB, the user can open a context menu by analysis. The first participant is between 30 and 39 years of right clicking on the chosen rule. The context menu will show age, male and holds a masters degree. The second participant two different menu items, namely ‘Information’ and ‘Delete’. is between 60 and 69 years of age, male, and holds a PhD. The information menu item opens a popup window in which Generally, both participants are well experienced in this field the analyst is presented the following information: and can be categorized as experts. • Assigned Concept: This information tells the analyst in Design and Procedure: Each participant was interviewed which schema category (concept) the rule is currently individually and had already tested the previous version of categorized. The assigned concept is implemented as a the prototype at least once. First, the participants received a selection list to give the user the opportunity to change short introduction to the new features of BiG2-KAMAS and the assigned concept. For that purpose, the analyst must also a quick reminder of the basic features and workflow. select a different concept in the list and press the save The participants were asked to mention additional missing button at the bottom of the pop up window. functionalities and to criticize all potential usability issues. 112 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS Both participants took part in the same two scenarios: First, the a specific call in a group of similar calls. Additionally, he participants had to load a Sequitur file, investigate the loaded recommended a search button for the regular expression call rules and filter specific call sequences. At the end they had to filter. This could help some users, since currently it is only store a rule in the KDB and name it. In the second scenario, possible to search by pressing the enter key. Adding a new the participants had to load three trace files. They were asked if rule to the KDB was no challenge for either participant and they perceived any differences when loading trace files instead both valued the ability to give the rule a specific name. of a Sequitur file. At the end they had to investigate a rule Scenario 2: Loading and analyzing three trace files. stored in the KDB and move it to a different category. Both participants had no difficulties with loading the three Equipment and Materials: The latest version of the BiG2- trace files. They also recognized quickly that each entry in the KAMAS prototype was used in the evaluation. For the first rule overview table now represents one trace. Neither of them user scenario, the participants had to load a Sequitur file with realized that the knowledge base filters and highlighting were about 500 rules and 30,000 system and API calls. In the second disabled. Participant 1 suggested to gray out the knowledge scenario, three trace files with a length between ten and fifteen base filters to make it clear that these are disabled. Participant calls were used. The bi-gram file had a total number of about 2 proposed to change the headings for the trace file analysis 117,000 bi-grams. The translated bi-gram file had already view in order to avoid confusion. He remarked that it could been generated so that the participants did not have to wait be misleading if the headings say e.g. ‘Rule Overview Table’ until the system finished the translation process. As evaluation when analyzing a trace file. Furthermore, both participants equipment, two different setups were used. Both participants recommended to change the occurrence column in the rule worked on a 13 inch Macbook Pro with a Retina display overview table to the file names of the traces. As the last task, (screen resolution of 2560x1600) and a mouse for navigation. the participants had to change the corresponding category of a Participant #1 worked with an additional 20 inch Monitor with random rule. Even if both participants solved this task easily, a full HD screen resolution and an external keyboard. Each both remarked that it would be useful if the user could move user test was conducted with the same version of the BiG2- a rule from one category to another per drag & drop. KAMAS prototype and was documented on paper. B. Result Analysis A. Results This section gives an overview of the issues which were The following section discusses the results of both scenar- mentioned during the expert reviews. Like Wagner et al. [3] ios. Both the results of ‘Scenario 1’ (Sequitur file) and ‘Sce- each issue was rated based on Nielsen’s [26] severity ratings. nario 2’ (trace files) will be presented. Both participants had Table IV shows the potential new features noted by the test no problem loading the different files for the user scenarios. persons and includes three columns: ‘feature requests’ (FR), Scenario 1: Loading and Analyzing a Sequitur file. ‘severities’ (SE) and the effort it would take to implement Both participants quickly recognized the additional color these changes [3]. The features mentioned in the table include scheme for the new benign category. The colors for the knowl- small cosmetic changes as well as real usability improvements. edge base highlighting were assessed as easily understandable The only feature mentioned by all participants is an additional and the additional rule counter next to the knowledge base tooltip which shows the actual bi-gram values. filters were mentioned as being very useful. Participant 1 men- tioned that if a rule in the rule overview table is highlighted, TABLE IV L IST OF REMARKED FEATURE REQUEST AND SEVERITIES AND THE EFFORT it would be useful to know which rule or rules of the KDB IT WOULD TAKE TO IMPLEMENT THEM IN THE PROTOTYPE . (FR: 1 = NICE match this rule in the table. Therefore, a tooltip would be TO HAVE , 2 = GOOD FEATURE , 3 = ENHANCES USABILITY; SE: 1 = MINOR , helpful which tells the user the names of the matching rules 2 = BIG , 3 = DISASTER ; E FFORT: 1 = MIN , 2 = AVERAGE , 3 = MAX ) [3]. of the KDB. Furthermore, participant 2 suggested to always Description FR SE Effort show the rule counter of the KDB’s categories. If there are currently no rules in a category, the counter should be zero. KDB: Move a rule to another category by using 2 1 1 When participant 2 first saw the bar chart bi-gram visualiza- drag & drop. KDB: Show the rule counter even if zero rules 1 - 1 tion, he assumed it visualizes the occurrence of the combined are included. call sequence. In contrast, the alpha color visualization was KDB: Gray out the knowledge base filters if they 2 1 1 immediately recognized as an indicator for maliciousness or are disabled. Tables: Highlighted rules in the rule overview ta- 3 2 3 benignity. Participant 1 also mentioned that the alpha color ble should show the KDB’s corresponding rules. visualization is easier and faster to recognize. Furthermore, Tables: Change the occurrence column to the 2 1 2 both participants mentioned that the color visualization is not trace file names. Tables: Show only the begin and the end of the 3 2 2 as precise as the bar chart visualization and therefore would calls in the call overview table. only be useful for initial malware classification. Participant 1 Tables: Implement a search button for the call 1 - 1 suggested an additional tooltip to display the accurate bi-gram regex search. Bigram: Tooltip to show the bi-gram values. 3 - 1 value. Participant 2 remarked that it would be more useful if Headings: Change the headings when loading 2 - 1 the calls in the call overview table only showed the beginning trace files. and the end of the call’s value. This would simplify finding 113 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS VI. D ISCUSSION & R EFLECTION (see 1:3a) to show the relation to the total number of occur- rences included in the loaded file. Additionally, normalizing The performed user studies described in Section V con- the occurrence dataset and visualization to this total could be firmed that the four feature requests, which are determined in beneficial. Section I are fulfilled by the BiG2-KAMAS prototype: Categorization of BiG2-KAMAS: Like the KAMAS pro- 1) Generic data loading: The BiG2-KAMAS prototype is totype [3] the BiG2-KAMAS prototype can be categorized structured to enable the generic loading of data sequences. To as a Malware Forensic as well as a Malware Classification make this possible the input data as well as the prototype’s tool in the Malware Visualization Taxonomy of Wagner et database are based on unique identifiers (id) instead of the al. [8]. However, due to the bi-gram based valuation the BiG2- actual values. Thus, all system-internal comparisons are based KAMAS prototype offers the malware analyst an additional on integer values instead of string values. Only with the assistance for the Individual Malware Analysis. corresponding translation table, the system can translate the ids to the actual values. Thus, it is possible to load data VII. C ONCLUSION sequences independent of their actual values as long as there In this work, we presented a design study for a Bi-gram Sup- is a translation table through which the prototype can translate ported Generic Knowledge-Assisted Malware Analysis System the data. Furthermore, the system was adopted to also offer (BiG2-KAMAS). The prototype is based on the KAMAS the opportunity to load raw system or API call based traces. prototype [3] and extended by additional features such as In this state the KDB highlighting and filtering is disabled generic data loading, an extension of the KDB to enable the but the user can explore the loaded trace files and add new analysis of benign rules, and the implementation of a bi-gram rules to the KDB. The prototype can’t only load Sequitur call based valuation approach. The requirements were discussed sequences, but also independent data sequences as long as the in a focus group meeting and then implemented as part of the data sequence has the given structure and a translation file. a functional prototype. After implementing the new features, 2) Extend the KDB with benign rules: To fulfill this require- two user studies were conducted to evaluate the design and ment the KDB was extended with an additional category for the functionality of the new BiG2-KAMAS prototype. benign activity. In this category, all rules which are identified ACKNOWLEDGMENTS as benign can be stored. Additionally, the KDB’s highlighting The financial support by the Austrian Federal Ministry of and filter pipelines were extended to identify and filter partially Science, Research and Economy and the National Foundation and fully benign rules. Rules with a partially or fully benign for Research, Technology and Development is gratefully ac- knowledge state are highlighted in blue in order to avoid the knowledged. combination of the colors red and green. This work was supported by the Austrian Science Fund 3) Implementation of bi-gram based valuation: To support (FWF) via the “KAVA-Time” project (P25489-N23) and by the the bi-gram approach of Luh et al, [10] the prototype’s Austrian Federal Ministry of Science, Research and Economy rule detail table was adopted. Since many domain experts under the FFG Innovationscheck (no. 856429). We would also mentioned [3] that the arc-diagram visualization is not very like to thank all focus group members and test participants who helpful, it was replaced by the bi-gram visualization. Bi-gram have agreed to volunteer in this project. based valuation is implemented with two different approaches. If the width of the bi-gram column is bigger than 75px the R EFERENCES valuation is visualized with bar charts and colored in red [1] M. Sikorski and A. Honig, Practical Malware Analysis: The Hands-On (malicious) or blue (benign). If the width is less than 75px Guide to Dissecting Malicious Software, 1st ed. No Starch Press, 2012. [2] M. Wagner, W. Aigner, A. Rind, H. Dornhackl, K. Kadletz, R. Luh, the bi-gram visualization uses the alpha channel to show the and P. Tavolato, “Problem characterization and abstraction for visual severity of the bi-gram (see Table III). analytics in behavior-based malware pattern analysis,” in Proceedings of 4) User studies to validate the new features: The results of the Eleventh Workshop on Visualization for Cyber Security, ser. VizSec ’14. ACM, 2014. the user studies show further feature requests which could be [3] M. Wagner, A. Rind, N. Thür, and W. Aigner, “A knowledge-assisted implemented in a future project. However, both participants visual malware analysis system: Design, validation, and reflection of mentioned that the bi-gram visualization is very helpful for KAMAS,” Computers & Security, vol. 67, pp. 1–15, 2017. [4] N. Thür, M. Wagner, J. Schick, C. Niederer, J. Eckel, R. Luh, and identifying potentially malicious or benign call sequences and, W. Aigner, “Big2-kamas: Supporting knowledge-assisted malware anal- therefore, helps to decide whether a rule is malicious or not. ysis with bi-gram based valuation,” in Poster of the 14th Workshop on Future Work: For the behavior-based malware analysis Visualization for Cyber Security (VizSec), Phoenix, Arizona, USA, 2017. [5] H. Shiravi, A. Shiravi, and A. Ghorbani, “A survey of visualization process, it could be valuable to implement a rule creation systems for network security,” vol. 18, no. 8, pp. 1313–1329, 2012. process where the analyst can build their own rules based on [6] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated the known system and API calls [27]. Furthermore, it could be dynamic malware-analysis techniques and tools,” vol. 44, no. 2, pp. 6:1– 6:42, 2008. beneficial to edit the stored rules in the KDB or to build new [7] Z. Bazrafshan, H. Hashemi, S. Fard, and A. Hamzeh, “A survey on rules based on existing patterns. Further avenues for future heuristic malware detection techniques,” 2013, pp. 113–120. work are to include possibilities to hide, shrink an expand [8] M. Wagner, F. Fischer, R. Luh, A. Haberson, A. Rind, D. A. Keim, and W. Aigner, “A survey of visualization systems for malware analysis,” areas to provide the user with more flexibility. Moreover, to in Eurographics Conference on Visualization (EuroVis) - STARs. The update the occurrence column of the Call Exploration area Eurographics Association, 2015. 114 A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS [9] L. McNabb and R. S. Laramee, “Survey of surveys sos - mapping the landscape of survey papers in information visualization,” Comput. Graph. Forum, vol. 36, no. 3, pp. 589–617, Jun. 2017. [Online]. Available: https://doi.org/10.1111/cgf.13212 [10] R. Luh, S. Schrittwieser, and S. Marschalek, “LLR-based Sentiment Analysis for Kernel Event Sequences.” IEEE, 2017. [11] R. Luh, G. Schramm, M. Wagner, and S. Schrittwieser, “Sequitur-based Inference and Analysis Framework for Malicious System Behavior,” 2017. [12] O. Somarriba, U. Zurutuza, R. Uribeetxeberria, L. Delosières, and S. Nadjm-Tehrani, “Detection and visualization of android malware behavior,” vol. 2016, p. e8034967, 2016. [13] S. Marschalek, R. Luh, M. Kaiser, and S. Schrittwieser, “Classifying malicious system behavior using event propagation trees.” ACM Press, 2015, pp. 1–10. [14] B. Xiaofang, C. Li, H. Weihua, and W. Qu, “Malware variant detection using similarity search over content fingerprint.” IEEE, 2014, pp. 5334– 5339. [15] A. Jain, H. Gonzalez, and N. Stakhanova, “Enriching reverse engineering through visual exploration of android binaries,” in Proceedings of the 5th Program Protection and Reverse Engineering Workshop, ser. PPREW-5. ACM, 2015, pp. 9:1–9:9. [16] O. E. David and N. S. Netanyahu, “DeepSign: Deep learning for automatic malware signature generation and classification.” IEEE, 2015, pp. 1–8. [17] P. M. Wrench and B. V. W. Irwin, “Towards a PHP webshell taxonomy using deobfuscation-assisted similarity analysis.” IEEE, 2015, pp. 1–8. [18] A. Stevenson and J. R. Cordy, “A survey of grammatical inference in software engineering,” Science of Computer Programming, vol. 96, pp. 444–459, 2014. [19] L. Ming and P. Vitányi, An introduction to Kolmogorov complexity and its applications. Springer Heidelberg, 1997. [20] C. G. Nevill-Manning and I. H. Witten, “Identifying hierarchical struc- ture in sequences: A linear-time algorithm,” J. Artif. Intell. Res. (JAIR), vol. 7, pp. 67–82, 1997. [21] T. Dunning, “Accurate methods for the statistics of surprise and coinci- dence,” Computational linguistics, pp. 61–74, 1993. [22] C. Ware, Information Visualization: Perception for Design. Elsevier, 2012. [23] B. Shneiderman, “The eyes have it: a task by data type taxonomy for information visualizations,” in Proc. of VL, 1996, pp. 336–343. [24] R. Fabian, “Data-Oriented Design,” 2013, ac- cessed on Nov. 11, 2015. [Online]. Available: http://www.dataorienteddesign.com/dodmain/dodmain.html [25] H. Dornhackl, K. Kadletz, R. Luh, and P. Tavolato, “Malicious behavior patterns,” in SOSE. IEEE, 2014, pp. 384–389. [26] J. Nielsen, Usability engineering. Boston: Academic Press, 1993. [27] M. Wagner, A. Rind, G. Rottermanner, C. Niederer, and W. Aigner, “Knowledge-assisted rule building for malware analysis,” in Proceedings of the 10th Forschungsforum der österreichischen Fachhochschulen, FH des BFI Wien. Vienna, Austria: FH des BFI Wien, 2016. 115