=Paper= {{Paper |id=Vol-2009/fmt-proceedings-2017-paper15 |storemode=property |title=A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS |pdfUrl=https://ceur-ws.org/Vol-2009/fmt-proceedings-2017-paper15.pdf |volume=Vol-2009 |authors=Niklas Thür,Markus Wagner,Johannes Schick,Christina Niederer,Jürgen Eckel,Robert Luh,Wolfgang Aigner |dblpUrl=https://dblp.org/rec/conf/fmt/Thur0SNELA17 }} ==A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS== https://ceur-ws.org/Vol-2009/fmt-proceedings-2017-paper15.pdf

A Bigram Supported Generic Knowledge-Assisted
Malware Analysis System: BiG2-KAMAS
Niklas Thür1 , Markus Wagner1 , Johannes Schick1 , Christina Niederer1 , Jürgen Eckel3 ,
Robert Luh2 , Wolfgang Aigner1
1
Institute of Creative\Media/Technologies, St. Pölten University of Applied Sciences, Austria
2
Josef Ressel Center for Unified Threat Intelligence on Targeted Attacks, Austria
3
IKARUS Security Software GmbH, Austria
Email: 1,2 first.last@fhstp.ac.at, 3 eckel.j@ikarus.at

Abstract—Malicious software, short “malware”, refers to soft- et al. [3] a design study for a behavior-based knowledge-
ware programs that are designed to cause damage or to perform assisted malware analysis system (referred to as KAMAS)
unwanted actions on the infected computer system. Behavior- is described. The malware analyst’s workflow involves the
based analysis of malware typically utilizes tools that produce
lengthy traces of observed events, which have to be analyzed tasks of examining potentially malicious behavior patterns,
manually or by means of individual scripts. Due to the growing selecting them, categorizing them, and storing the found rules
amount of data extracted from malware samples, analysts are in the knowledge database (KDB) [3]. We developed an
in need of an interactive tool that supports them in their interactive prototype to extend the KAMAS design study [3]
exploration efforts. In this respect, the use of visual analytics with a new feature of Bi-Gram supported Generic Knowledge-
methods and stored expert knowledge helps the user to speed
up the exploration process and, furthermore, to improve the Assisted Malware Analysis System (BiG2-KAMAS) [4]. A
quality of the outcome. In this paper, the previously developed focus group meeting with members of an Austrian IT security
KAMAS prototype is extended with additional features such as company, the Information security department of St. Pölten
the integration of a bi-gram based valuation approach to cover UAS and the developers of the initial KAMAS prototype
further malware analysts’ needs. The result is a new prototype was conducted to identify the tasks and needs for additional
which was evaluated by two domain experts in a detailed user
study. features requested by the IT security company to extend the
KAMAS design study [3]. Based on this feature list, the paper
I. I NTRODUCTION at hand contributes the following:
1) Integrating a generic data loading process enabling
Malicious software, or short malware, is one of the biggest KAMAS to load any kind of data, based on a given
threats to computer systems these days [1]. ’Malware’ refers structure;
to software programs, which are designed to cause damage or 2) Storing benign rules and their highlighting when loading
perform other unwanted actions on a computer or network. new cluster files, thereby supporting the analyst;
Therefore malware plays a big part in most computer in- 3) Identifying malicious or benign call sequences by in-
trusions and security incidents. Malware includes inter alia: cluding a bi-gram based valuation;
viruses, trojan horses, worms, rootkits, scareware, and spy- 4) Presenting in detail two user studies validating the new
ware [1]. By now there are millions of malicious programs features.
and the number is increasing every day. This paper is structured as follows: Sect. II provides back-
“Malware analysis is the art of dissecting malware to ground knowledge about the work of our collaborators and
understand how it works, how to identify it, and how to related work in the field of malware analysis. In Sect. III we
defeat or eliminate it” [1]. In malware analysis, there are two describe the prototype’s design, visualization methods and im-
basic approaches to examine a malware program: the static plementation. Furthermore, Sect. IV defines the integration of
and the dynamic approach. Often the malware analyst only additional knowledge in the prototype’s knowledge database.
has the potentially malicious executable, which includes the Sect. V shows the prototype’s evaluation method, while results
machine code but is not human-readable. Therefore, static are discussed in Sect. VI.
malware analysis involves the investigation of the malware
executable as well as certain reverse-engineering tasks to II. R ELATED W ORK
recover the sample’s source code. On the other hand, dynamic Shiravi et al. [5] published a survey related to network se-
analysis requires the execution of the malicious software on curity visualization, comparing the data sources and visualiza-
e.g. a virtualized host machine to detect the malware’s run- tion techniques of thirty-eight different systems. Furthermore,
time behavior [1]. To cover all of the malware analyst’s Egele et al. [6] presented a general literature for malware
needs, Wagner et al. [2] performed a problem characterization analysis techniques and tools. In their work they surveyed
and abstraction elaborating the analysts needs in relation to different approaches for dynamic automated malware analy-
behavior-based malware analysis. In the article by Wagner sis and compared them based on their analysis techniques.

107
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

Fig. 1. The BiG2-KAMAS prototype and it’s three sections: Section 1 shows the knowledge base including the KDB (1a) with it’s new category for benign
activity. Beneath the knowledge base highlighting filters are displayed (1b). Section 2 shows the rule exploration area including the bigram visualization (2b)
and new color highlighting for benign rules (2a). Finally, section 3 shows the call exploration area.

Likewise, Bazrafshan et al. [7] surveyed various heuristic and classification” [16]. Wrench and Irwin [17] published an
malware detection techniques as well as malware obfuscation approach in which they identify and classify Remote Access
techniques. Additionally, Wagner et al. [8] published a survey Trojans (RATs) and other malicious software based on the
of 25 different visualization systems for malware analysis. The programming language PHP.
objective of their work was the comparison and categorization
of the malware systems visualization methods and features and III. P ROTOTYPE C ONCEPT
categorizing them along their novel ’Malware Visualization This section describes the new features of the ‘Bi-Gram
Taxonomy’. Furthermore, McNabb and Laramee [9] published supported Generic Knowledge-Assisted Malware Analysis
a survey of surveys: Mapping The Landscape of Survey Papers System (BiG2-KAMAS), conceptually grounded on the KA-
in Information Visualization. MAS prototype [3].

In 2017, Wagner et al. [3] published a paper on a A. Data
Knowledge-Assisted Malware Analysis System, referred to as In its current iteration, BiG2-KAMAS bases its visualization
KAMAS. In their user study, they found out that the experts on sequential traces of Windows kernel operations amounting
are not only interested in visualizing patterns. A supportive to benign and malicious application behavior in the context
valuation approach was implemented by Luh et al. [10], [11], of OS and user-initiated processes. These events are typically
calculating the degree of maliciousness based on system and abstractions of raw system and API calls that yield information
API call bi-grams. Somarriba et al. [12] presented another about the general behavior of an unknown application sam-
malware detector system for Android Malware Behavior. Be- ple or resident process [8]. Raw calls may include wrapper
sides, Marschalek et al. [13] published a system for threat functions (e.g. CreateFile) that offer a simple interface
detection using a real-time monitoring agent to gather all or to the application programmer, or native system calls (e.g.
only selected system events and visualize these using event NtCreateFile) that represent the underlying OS or kernel
propagation trees. Xiaofang et al. [14] published a paper of support functions. In the context of BiG-KAMAS and its data
a malware variant detection approach using Similarity Search providers, events are collected directly from the Windows
“by processing malware as content fingerprint” [14]. Jain et kernel. We employ a driver-based monitoring agent [13]
al. [15] presented a visual exploration approach of android designed to collect and forward a number of events to a
binary files. Their approach is based on the visualization of database server. This gives us unimpeded access to events
android .dex files to analyze and compare malicious android depicting operations related to process and thread control,
executables. David et al. [16] presented “a novel deep learning image loads, file management, registry modification, network
based method for automatic malware signature generation socket interaction, and more. For example, a shell event that

108
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

creates a new binary file on a system may be simply denoted as TABLE I
a triple explorer.exe,file-create,sample.exe. O PERATION OF S EQUITUR AFTER [20]. P ROPERTY APPLICATION IS
italicized.
Additional information captured in the background includes
various process and thread ID information required to uniquely Symbol String Grammar Remarks
identify an event within a system session and to link individual
1 a S→a
events to a full sequence (trace) needed for further processing 2 ab S → ab
stages. Based on aforementioned traces, BiG2-KAMAS uses 3 abc S → abc
two distinct mechanisms to further process arbitrary kernel 4 abcd S → abcd
5 abcdb S → abcdb
event sequences: 6 abcdbc S → abcdbc bc appears 2x
Pattern inference: Our introduced framework has been S → aAdA bigram uniqness
developed in concert with an event extraction system called A → bc
7 abcdbca S → aAdAa
SEQUIN [11]. SEQUIN uses grammar inference extended A → bc
with statistical evaluation to automatically identify and crop 8 abcdbcab S → aAdAab
relevant sequences (rules) from traces of kernel-level behav- A → bc
9 abcdbcabc S → aAdAabc bc reappears
ioral data for further processing and visualization. Generally A → bc
speaking, grammar inference is the process of computationally S → aAdAaA bigram uniqness
assembling a formal ruleset by examining the sentences of an A → bc aA appears 2x
S → BdAB bigram uniqness
unknown language [18]. In the information security domain, A → bc
grammar inference is primarily used for pattern recognition, B → aA
computational biology, natural language processing, language 10 abcdbcabcd S → BdABd Bd appears 2x
A → bc
design programming, data mining, and machine learning. B → aA
Grammar inference has also been proven to be a feasible S → CAC bigram uniqness
approach to anomaly detection, since “algorithmic incompress- A → bc B used only 1x
B → aA
ibility is a necessary and sufficient condition for randomness” C → Bd
[19]. We use grammar inference as key component in the S → CAC rule utility
process of ‘compressing’ a sequential trace for extracting A → bc
C → aAd
relevant behavioral patterns.
To achieve inference by compression in a computationally
feasible way, we selected an algorithm that losslessly produces
(without changes to order and immutability) a context-free [10]. An LLR test is a statistical method used test model
grammar (CFG) in unsupervised operation. As opposed to assumptions, namely the quality of fit of a reference (null)
context-sensitive grammars, languages created by a CFG can and an alternative model. When determining the occurrence
be recognized in O(n3 ) time, which is a relevant distinction for of rarely observed events – which are often at the core of
all future parsing efforts. The choice ultimately fell on Sequitur malicious traces – likelihood ratio tests show significantly
[20]. Sequitur is a greedy compression algorithm that creates better results than alternatives such as x2 or z-score tests [21].
a hierarchical structure (CFG) from a sequence of discrete In preparation for sentiment-assisted visualization, we use
symbols by recursively replacing repeated phrases with a the LLR method to learn likely benign and malicious event se-
grammatical rule. The output is a compressed representation of quences in big corpora of recorded kernel operations (traces).
the original sequence. The algorithm creates this representation The resulting sentiment dictionary can be used to accurately
through the application of two base properties: rule utility and and effectively determine if an investigated event bi-gram is
bi-gram uniqueness. Rule utility checks if a rule occurs at least contextually suspicious. Specifically, we compute the LLR
twice in the grammar, while bi-gram uniqueness observes if score for each bi-gram to highlight collocations characteristic
two adjacent symbols occur only once. Assuming we have to sequences of malicious and benign system events [10].
a string abcdbcabcd, where every character represents an The resulting occurrence counts (shown in Table II) are
event, the first bi-gram of that trace would be ab, followed by the basis for this calculation: Following the approach by
a second bi-gram bc, and so forth. See Table I for a complete [21], we define the number of times both event tokens occur
example of the process. in combination (k11 ), the number of times each token has
Sequitur is linear in space and time. In terms of data been observed independently from the other (k12 and k21 ,
compression, the algorithm can outperform other designs depending on the relative position in the bi-gram), and the
that achieve data reduction by factoring out repetition. It is number of times the token was not present at all (k22 ).
almost as performant as designs that compress data based on
probabilistic predictions [20]. TABLE II
Bi-gram extraction and scoring: In addition to rule infer- E VENT OCCURRENCE MATRIX [10]
ence, BiG2-KAMAS uses precomputed maliciousness scores A !A
of event bi-grams separately explored using a sentiment-like B k11 =k(AB) k12 =k(!AB)
extraction system based on the log likelihood ratio (LLR) test !B k21 =k(A!B) k22 =k(!A!B)

109
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

The same process is later applied to the pattern’s general The background of the third column of the ‘Rule Overview
occurrence in a labeled benign versus malicious corpus. The Table’ indicates whether a rule is fully benign, partially
final result is a normalized sentiment rating ranging from benign, not known, partially malicious or fully malicious.
+1.0 (benign) to −1.0 (malicious). Unknown bi-grams are The background of the malicious rules will be painted in red
ultimately scored against the resulting dictionary, the outcome and the background of the benign rules in blue. The fully
of which is at the core of the bi-gram evaluation feature in the known rules will be displayed in a dark red/blue while the
new BiG2-KAMAS prototype. partially known rules are highlighted in a light red/blue (see
Figure 1:1b). The red color highlighting for malicious activity
B. Visualization Design is adopted of the KAMAS prototype [3]. If a rule is fully
Structure: Wagner et al. [3] describe in their article that known and, therefore, highlighted in dark red, the rule is
since IT-security experts are commonly familiar with pro- included as-is in the KDB. A partially known rule is only a
gramming IDEs, they used the design concept of IDEs like part of one rule in the KDB. This kind of rule has at least one
Eclipse or Netbeans for their prototype. The updates to the new additional call at the beginning or at the end of a fully known
prototype also follow this design concept approach. In contrast rule [3]. If an input file was loaded, the system automatically
to the previous prototype, the new one has an additional view. calculates the knowledge state of each rule. For this purpose,
In this initial view the KDB is situated on the left side, which the system compares each rule of the input file with each
can be compared to the project view in Eclipse. On the right rule of the KDB. After the calculation process the system
side only the file load buttons are displayed, which can be highlights the rules in the corresponding colors in the rule
compared to the initial view of Eclipse, where no project has overview table.
been opened yet. Bi-Gram Visualization: The rule detail table is located
Coloring: For the rule highlighting as well as the Bi-Gram next to the rule overview table (see Figure 1:2b). The rule
visualization we selected a sequential color scheme from red detail table automatically updates its content when clicking
to blue. Red indicates that the rule or bi-gram is malicious on a rule in the rule overview table and represents all system
and a blue one stands for a benign rule or bi-gram. To avoid and API calls included in the selected rule. From left to right,
problems with red and green hues for colorblind people [22, p. the table displays the unique id as well as the name of the call.
124], we used blue instead of green and select colorblind-safe The last column visualizes the new bi-gram based valuation
qualitative colors from Colorbrewer1 . approach for the corresponding calls. As mentioned before,
Layout: The prototype is structured into three parts: knowl- the prototype uses the bi-gram approach of Luh et al. [10].
edge base, rule exploration area and call exploration area (see A bi-gram is an n-gram where the length of n = 2. An
Figure 1). On the left side the knowledge base is visualized n-gram, in turn, is a coherent sequence of n elements. In
with it’s ‘Knowledge Database (KDB)’ (see Figure 1:1a) and this approach the elements are system or API calls. Each
the KDB’s color highlighting filters (see Figure 1:1b). The bi-gram has a score in the range [-1, 1], which indicates
KDB is displayed as a tree, in which each category of the whether this pair of calls is malicious or benign. For bi-
database can have several subcategories. Each category with gram based valuation, two different visualization approaches
subcategories is shown with a box icon (see Figure 1:1a) were implemented following a semantic zooming approach:
and the ones without subcategories are displayed with folder First, if the width of the bi-gram column is bigger than 75px,
icons. Each rule, which is stored in the database, is displayed the prototype visualizes the bi-gram values as bar charts (see
with a paper icon. Beneath the KDB the ‘Knowledge Base Figure 2:a), whereby each bar starts in the middle of the bi-
Highlighting’ filters are displayed (see Figure 1:1b). Each filter gram column. If the bi-gram score is between 0 and -1, the
can be activated or disabled with its checkbox and updates the bi-gram is malicious. Therefore, the red color bar chart unfurls
result of the prototypes filter pipeline and visualization of the from the middle towards the left side. If the bi-gram score is
‘Rule Overview Table’ (see Figure 1:2a). between 0 and 1 the bi-gram is benign and the bar chart is
After loading and translating the input file, the system visualized from the middle to the right side in a blue color. The
updates the ‘Graphical User Interface’ (GUI) and visualizes colors correspond to the KDB highlighting. The visualization
new elements. In the middle the ‘Rule Exploration’ area (see approach was chosen to give the user a quick but still precise
Figure 1:2) is visualized, while the right side contains the ‘Call overview of the bi-gram based scores.
Exploration’ area (see Figure 1:3). If the width of the bi-gram column is smaller than 75px and
In the ‘Call Exploration’ area all the included system or API therefore the bar charts are hardly recognizable, the system
calls of the loaded input file are represented in the call table switches to the second visualization. Here, the bi-gram values
(see Figure 1:2b) as described by Wagner et al. [3]. The rules are visualized as a color-filled rectangle (see Figure 2:b).
included in the input file are visualized in the rule overview As before, a red colored rectangle indicates that the bi-gram
table located in the ‘Rule Exploration’ area (see Figure 1:2a). is malicious and a blue one stands for a benign bi-gram.
If the user loads several trace files, each trace file will be To visualize the value of the malicious or benign bi-gram,
displayed as one rule. the system changes the alpha value of the displayed color.
Therefore, the darker the color, the higher the respective value.
1 http://colorbrewer2.org Since the difference of an alpha value between 255 and 240 is

110
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

data of these files. Contrary to a loaded Sequitur file, each
entry of the rule overview table represents an entire trace file.
Thus, if the user loads three traces the rule overview table will
have only three rows. Furthermore, due to the fact that the user
analyses several independent trace files the histogram for the
rule occurrence is insignificant. Therefore, only one histogram
for the trace length will be displayed in the rule filter area.
Rearrange: If the rule overview table and the call overview
table are loaded with data, the user can rearrange their content
by clicking on a table’s column. This will re-sort the included
data and update the visualization [3]. The content of the rule
detail table cannot be rearranged since the calls are shown in
their sequential order and should therefore not be changeable.
Fig. 2. The two different visualisations methods of the call bi-grams. The Filter: In the next step the user can reduce the number of
first method visualises the bi-grams as bar charts (a), whereas the second rules or trace files by using the rule/trace and call filters [3].
visualisation uses the alpha channel to show the severity of the bi-gram (b).
No matter which files were loaded, the user always has the
opportunity to filter the rules or traces by the included calls
not easy to recognize and every value below 100 is generally (events). The user can rearrange the call filters or select a
difficult to see, we decided to implement only four graduation specific call in the call overview table to reduce the number of
steps for the alpha value. The visualization with the alpha shown rules [3]. Furthermore, the analyst can filter the rules or
value is less precise than the visualization with the bar charts specific traces by using the filters in the rule exploration area.
but, at the same time, significantly easier to interpret. Table III If loading a Sequitur file, the analyst can filter the rules by their
shows the different graduation steps and their value ranges. occurrence, length, whether they are equally distributed in the
input file or if they match, partially match, or don’t match the
TABLE III stored rules in the KDB [3]. By changing the filter settings, the
C OLOUR GRADUATION STEPS FOR THE ALPHA VALUE BI - GRAM included rules in the rule overview table automatically update
VISUALISATION .
immediately. If one or more trace files were loaded, the analyst
Colour Alpha value Value ranges can only filter the shown traces in the rule overview table by
200 >= 0.75
their length. In addition, the highlighting and filtering of the
KDB is switched off.
150 >= 0.5 && <0.75
Details-on-Demand: If the user wants to analyze a rule or
100 >= 0.25 && <0.5 trace, he/she can open the rule/trace in the rule detail table
50 >= 0 && <0.25 by selecting it in the rule overview table. This will display
all the included calls in the rule detail table in their sequential
50 <0 && <= -0.25
order [3]. The bi-grams provide information whether a combi-
100 <-0.25 && <= -0.5 nation of two calls is malicious or benign. This should support
150 <-0.5 && <= -0.75 the user in finding interesting call sequences more quickly.
200 <-0.75 Extract: Independent of the loaded files the analyst can
add a new rule to the database using two different ways.
One method is to simply select one rule or trace in the rule
C. Interaction overview table and simply drag and drop it in one leaf category
Like the KAMAS prototype of Wagner et al. [3], the BiG2- of the KDB. This will add the entire rule or trace file to the
KAMAS’s functionality will be described in accordance to database [3]. Alternatively, the analyst can select several calls
the four steps of the visual information seeking mantra of of interest in the call overview table and add these by dragging
Shneiderman et al. [23], namely overview, rearrange and filter, and dropping them to the KDB. When adding a new rule to
details-on-demand and, extract. the KDB, a popup window will show up where the analyst
Overview: The BiG2-Kamas prototype has an additional can assign the rule a specific name. If the user has loaded a
initial view where the user can decide whether to load a Sequitur file, the system will now update the knowledge state
Sequitur input file or several raw trace files. When the analyst for all rules as well as the highlighting in the rule overview
loads a Sequitur file, the rule and call tables will be filled with table for further analysis.
the rule and call data included in the input file. Each entry
in the rule overview table represents one rule of the loaded D. Implementation
cluster. Furthermore, the histograms in the rule exploration Since the BiG2-KAMAS prototype is based on the proto-
area give a quick impression of the distribution in the rule type of Wagner et al. [3], it also uses a data-oriented design
occurrence and length [3]. When the user loads one or more concept [24]. To increase the performance of the prototype,
trace files the rule and call tables will also be filled with the the system only works with integer comparisons. Therefore,

111
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

the input data only includes the call ids. It is only possible to •Rule Name: Here, the actual rule name is displayed. The
translate a call id to the actual call value with an additional rule name is implemented as a text field to quickly change
translation file. This translation file is also used for the bi- it if necessary.
grams. The original bi-gram file has several columns in which • Included Calls: Finally, the calls included in the stored
only the string values of the system or API calls are stored. rule are displayed in a table. Thus, the calls are visualized
To increase the performance and to reduce memory usage, the in their sequential order and each call will be shown with
BiG2-KAMAS prototype generates its own bi-gram file. When its unique call id which corresponds to the call id of the
starting the prototype the system checks with md5 hash values translation file and the actual call value. In the current
to determine whether the translation file or the original bi-gram version of the prototype it is only possible to investigate
file has changed. If so, the system converts the original bi-gram the included calls in their sequential order, but not to
file to the translated bi-gram file in which also the integer delete specific calls which are listed in the table.
values of the system calls are stored. Like the prototype of The second menu item is the “delete’ item, which allows
Wagner et al. [3] the new prototype is using the action pipeline the analyst to delete the currently selected rule. Furthermore,
for filter options. This enables dynamic query environments when selecting a concept instead of a rule, the BiG2-Kamas
and real-time data operations. prototype will show a context menu with which the user
To evaluate the robustness and performance of the BiG2- can disable a category and all its integrated subcategories.
KAMAS prototype three different Sequitur cluster-grammar Thus, the analyst can disable the entire KDB or only specific
files containing between 10 and 500 rules were used. The file categories. If the user disables a category all the included
with 500 different rules contained a total amount of 30,000 rules will no longer be considered in the knowledge base
system and API calls. To test the bi-gram functionality, a bi- highlighting and filtering.
gram file with nearly 117,500 bi-gram entries was loaded. On When the user clicks the right mouse button to open
a machine with an 2.1GHZ Dual-Core processor and 12GB the corresponding context menu before selecting a rule or
of memory it took the system about four minutes to translate category, the system automatically selects the rule/category at
the original bi-gram file to the translated bi-gram file. The the actual mouse position.
malware and bi-gram samples were collected by collaborators Searching: If the user searches for interesting rules or
in the Josef Ressel Center TARGET of St. Pölten UAS. specific calls or call groups he/she can use the call filter options
to reduce the data to be analyzed. In the call exploration area,
IV. E XTERNALIZED K NOWLEDGE I NTEGRATION the user can search for a specific call by entering its name or
use regular expressions to find an entire call group. Beneath
As Wagner et al., [3] described in their article, we integrated
the search text field the user can enable case sensitive search
a knowledge database to support the user during their analysis
with the corresponding checkbox ’Case Sensitive’. Filtering or
tasks. The KDB is based on the malware behavior schema of
searching the calls affects the data shown in the call overview
Dornhackl et al., [25]. The KDB is located at the left side of
and rule overview table. Additionally, to find rules of interest
the prototype and is implemented in a hierarchical structure
the analyst can use the rule exploration filters or the knowledge
(tree structure). In the BiG2-KAMAS prototype the KDB was
base filters.
extended by one additional category to store the benign rule
data, namely benign activity. In the current version of the V. P ROTOTYPE E VALUATION
prototype there is only one category to store benign rule data. This section describes the procedure of the performed user
Each category is displayed with either a box or a folder icon, studies, the specific results, as well as further feature requests.
the category description and the number of included rules in For the prototype validation, a user study with two domain
the integrated subfolders. The analyst can add new rules by experts was conducted. The domain experts validated the
drag & drop. When adding a new rule, the KDB automatically functionality as well as the visual design interface.
unfolds closed categories. Additionally, a popup window opens Participants: Both participants work at St. Pölten UAS and
in which the analyst can enter a rule name. To investigate a have more than five years of experience in the field of malware
rule stored in the KDB, the user can open a context menu by analysis. The first participant is between 30 and 39 years of
right clicking on the chosen rule. The context menu will show age, male and holds a masters degree. The second participant
two different menu items, namely ‘Information’ and ‘Delete’. is between 60 and 69 years of age, male, and holds a PhD.
The information menu item opens a popup window in which Generally, both participants are well experienced in this field
the analyst is presented the following information: and can be categorized as experts.
• Assigned Concept: This information tells the analyst in Design and Procedure: Each participant was interviewed
which schema category (concept) the rule is currently individually and had already tested the previous version of
categorized. The assigned concept is implemented as a the prototype at least once. First, the participants received a
selection list to give the user the opportunity to change short introduction to the new features of BiG2-KAMAS and
the assigned concept. For that purpose, the analyst must also a quick reminder of the basic features and workflow.
select a different concept in the list and press the save The participants were asked to mention additional missing
button at the bottom of the pop up window. functionalities and to criticize all potential usability issues.

112
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

Both participants took part in the same two scenarios: First, the a specific call in a group of similar calls. Additionally, he
participants had to load a Sequitur file, investigate the loaded recommended a search button for the regular expression call
rules and filter specific call sequences. At the end they had to filter. This could help some users, since currently it is only
store a rule in the KDB and name it. In the second scenario, possible to search by pressing the enter key. Adding a new
the participants had to load three trace files. They were asked if rule to the KDB was no challenge for either participant and
they perceived any differences when loading trace files instead both valued the ability to give the rule a specific name.
of a Sequitur file. At the end they had to investigate a rule Scenario 2: Loading and analyzing three trace files.
stored in the KDB and move it to a different category. Both participants had no difficulties with loading the three
Equipment and Materials: The latest version of the BiG2- trace files. They also recognized quickly that each entry in the
KAMAS prototype was used in the evaluation. For the first rule overview table now represents one trace. Neither of them
user scenario, the participants had to load a Sequitur file with realized that the knowledge base filters and highlighting were
about 500 rules and 30,000 system and API calls. In the second disabled. Participant 1 suggested to gray out the knowledge
scenario, three trace files with a length between ten and fifteen base filters to make it clear that these are disabled. Participant
calls were used. The bi-gram file had a total number of about 2 proposed to change the headings for the trace file analysis
117,000 bi-grams. The translated bi-gram file had already view in order to avoid confusion. He remarked that it could
been generated so that the participants did not have to wait be misleading if the headings say e.g. ‘Rule Overview Table’
until the system finished the translation process. As evaluation when analyzing a trace file. Furthermore, both participants
equipment, two different setups were used. Both participants recommended to change the occurrence column in the rule
worked on a 13 inch Macbook Pro with a Retina display overview table to the file names of the traces. As the last task,
(screen resolution of 2560x1600) and a mouse for navigation. the participants had to change the corresponding category of a
Participant #1 worked with an additional 20 inch Monitor with random rule. Even if both participants solved this task easily,
a full HD screen resolution and an external keyboard. Each both remarked that it would be useful if the user could move
user test was conducted with the same version of the BiG2- a rule from one category to another per drag & drop.
KAMAS prototype and was documented on paper.
B. Result Analysis
A. Results This section gives an overview of the issues which were
The following section discusses the results of both scenar- mentioned during the expert reviews. Like Wagner et al. [3]
ios. Both the results of ‘Scenario 1’ (Sequitur file) and ‘Sce- each issue was rated based on Nielsen’s [26] severity ratings.
nario 2’ (trace files) will be presented. Both participants had Table IV shows the potential new features noted by the test
no problem loading the different files for the user scenarios. persons and includes three columns: ‘feature requests’ (FR),
Scenario 1: Loading and Analyzing a Sequitur file. ‘severities’ (SE) and the effort it would take to implement
Both participants quickly recognized the additional color these changes [3]. The features mentioned in the table include
scheme for the new benign category. The colors for the knowl- small cosmetic changes as well as real usability improvements.
edge base highlighting were assessed as easily understandable The only feature mentioned by all participants is an additional
and the additional rule counter next to the knowledge base tooltip which shows the actual bi-gram values.
filters were mentioned as being very useful. Participant 1 men-
tioned that if a rule in the rule overview table is highlighted, TABLE IV
L IST OF REMARKED FEATURE REQUEST AND SEVERITIES AND THE EFFORT
it would be useful to know which rule or rules of the KDB IT WOULD TAKE TO IMPLEMENT THEM IN THE PROTOTYPE . (FR: 1 = NICE
match this rule in the table. Therefore, a tooltip would be TO HAVE , 2 = GOOD FEATURE , 3 = ENHANCES USABILITY; SE: 1 = MINOR ,
helpful which tells the user the names of the matching rules 2 = BIG , 3 = DISASTER ; E FFORT: 1 = MIN , 2 = AVERAGE , 3 = MAX ) [3].
of the KDB. Furthermore, participant 2 suggested to always Description FR SE Effort
show the rule counter of the KDB’s categories. If there are
currently no rules in a category, the counter should be zero. KDB: Move a rule to another category by using 2 1 1
When participant 2 first saw the bar chart bi-gram visualiza- drag & drop.
KDB: Show the rule counter even if zero rules 1 - 1
tion, he assumed it visualizes the occurrence of the combined are included.
call sequence. In contrast, the alpha color visualization was KDB: Gray out the knowledge base filters if they 2 1 1
immediately recognized as an indicator for maliciousness or are disabled.
Tables: Highlighted rules in the rule overview ta- 3 2 3
benignity. Participant 1 also mentioned that the alpha color ble should show the KDB’s corresponding rules.
visualization is easier and faster to recognize. Furthermore, Tables: Change the occurrence column to the 2 1 2
both participants mentioned that the color visualization is not trace file names.
Tables: Show only the begin and the end of the 3 2 2
as precise as the bar chart visualization and therefore would calls in the call overview table.
only be useful for initial malware classification. Participant 1 Tables: Implement a search button for the call 1 - 1
suggested an additional tooltip to display the accurate bi-gram regex search.
Bigram: Tooltip to show the bi-gram values. 3 - 1
value. Participant 2 remarked that it would be more useful if Headings: Change the headings when loading 2 - 1
the calls in the call overview table only showed the beginning trace files.
and the end of the call’s value. This would simplify finding

113
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

VI. D ISCUSSION & R EFLECTION (see 1:3a) to show the relation to the total number of occur-
rences included in the loaded file. Additionally, normalizing
The performed user studies described in Section V con- the occurrence dataset and visualization to this total could be
firmed that the four feature requests, which are determined in beneficial.
Section I are fulfilled by the BiG2-KAMAS prototype: Categorization of BiG2-KAMAS: Like the KAMAS pro-
1) Generic data loading: The BiG2-KAMAS prototype is totype [3] the BiG2-KAMAS prototype can be categorized
structured to enable the generic loading of data sequences. To as a Malware Forensic as well as a Malware Classification
make this possible the input data as well as the prototype’s tool in the Malware Visualization Taxonomy of Wagner et
database are based on unique identifiers (id) instead of the al. [8]. However, due to the bi-gram based valuation the BiG2-
actual values. Thus, all system-internal comparisons are based KAMAS prototype offers the malware analyst an additional
on integer values instead of string values. Only with the assistance for the Individual Malware Analysis.
corresponding translation table, the system can translate the
ids to the actual values. Thus, it is possible to load data VII. C ONCLUSION
sequences independent of their actual values as long as there In this work, we presented a design study for a Bi-gram Sup-
is a translation table through which the prototype can translate ported Generic Knowledge-Assisted Malware Analysis System
the data. Furthermore, the system was adopted to also offer (BiG2-KAMAS). The prototype is based on the KAMAS
the opportunity to load raw system or API call based traces. prototype [3] and extended by additional features such as
In this state the KDB highlighting and filtering is disabled generic data loading, an extension of the KDB to enable the
but the user can explore the loaded trace files and add new analysis of benign rules, and the implementation of a bi-gram
rules to the KDB. The prototype can’t only load Sequitur call based valuation approach. The requirements were discussed
sequences, but also independent data sequences as long as the in a focus group meeting and then implemented as part of
the data sequence has the given structure and a translation file. a functional prototype. After implementing the new features,
2) Extend the KDB with benign rules: To fulfill this require- two user studies were conducted to evaluate the design and
ment the KDB was extended with an additional category for the functionality of the new BiG2-KAMAS prototype.
benign activity. In this category, all rules which are identified ACKNOWLEDGMENTS
as benign can be stored. Additionally, the KDB’s highlighting
The financial support by the Austrian Federal Ministry of
and filter pipelines were extended to identify and filter partially
Science, Research and Economy and the National Foundation
and fully benign rules. Rules with a partially or fully benign
for Research, Technology and Development is gratefully ac-
knowledge state are highlighted in blue in order to avoid the
knowledged.
combination of the colors red and green.
This work was supported by the Austrian Science Fund
3) Implementation of bi-gram based valuation: To support
(FWF) via the “KAVA-Time” project (P25489-N23) and by the
the bi-gram approach of Luh et al, [10] the prototype’s
Austrian Federal Ministry of Science, Research and Economy
rule detail table was adopted. Since many domain experts
under the FFG Innovationscheck (no. 856429). We would also
mentioned [3] that the arc-diagram visualization is not very
like to thank all focus group members and test participants who
helpful, it was replaced by the bi-gram visualization. Bi-gram
have agreed to volunteer in this project.
based valuation is implemented with two different approaches.
If the width of the bi-gram column is bigger than 75px the R EFERENCES
valuation is visualized with bar charts and colored in red [1] M. Sikorski and A. Honig, Practical Malware Analysis: The Hands-On
(malicious) or blue (benign). If the width is less than 75px Guide to Dissecting Malicious Software, 1st ed. No Starch Press, 2012.
[2] M. Wagner, W. Aigner, A. Rind, H. Dornhackl, K. Kadletz, R. Luh,
the bi-gram visualization uses the alpha channel to show the and P. Tavolato, “Problem characterization and abstraction for visual
severity of the bi-gram (see Table III). analytics in behavior-based malware pattern analysis,” in Proceedings of
4) User studies to validate the new features: The results of the Eleventh Workshop on Visualization for Cyber Security, ser. VizSec
’14. ACM, 2014.
the user studies show further feature requests which could be [3] M. Wagner, A. Rind, N. Thür, and W. Aigner, “A knowledge-assisted
implemented in a future project. However, both participants visual malware analysis system: Design, validation, and reflection of
mentioned that the bi-gram visualization is very helpful for KAMAS,” Computers & Security, vol. 67, pp. 1–15, 2017.
[4] N. Thür, M. Wagner, J. Schick, C. Niederer, J. Eckel, R. Luh, and
identifying potentially malicious or benign call sequences and, W. Aigner, “Big2-kamas: Supporting knowledge-assisted malware anal-
therefore, helps to decide whether a rule is malicious or not. ysis with bi-gram based valuation,” in Poster of the 14th Workshop on
Future Work: For the behavior-based malware analysis Visualization for Cyber Security (VizSec), Phoenix, Arizona, USA, 2017.
[5] H. Shiravi, A. Shiravi, and A. Ghorbani, “A survey of visualization
process, it could be valuable to implement a rule creation systems for network security,” vol. 18, no. 8, pp. 1313–1329, 2012.
process where the analyst can build their own rules based on [6] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated
the known system and API calls [27]. Furthermore, it could be dynamic malware-analysis techniques and tools,” vol. 44, no. 2, pp. 6:1–
6:42, 2008.
beneficial to edit the stored rules in the KDB or to build new [7] Z. Bazrafshan, H. Hashemi, S. Fard, and A. Hamzeh, “A survey on
rules based on existing patterns. Further avenues for future heuristic malware detection techniques,” 2013, pp. 113–120.
work are to include possibilities to hide, shrink an expand [8] M. Wagner, F. Fischer, R. Luh, A. Haberson, A. Rind, D. A. Keim, and
W. Aigner, “A survey of visualization systems for malware analysis,”
areas to provide the user with more flexibility. Moreover, to in Eurographics Conference on Visualization (EuroVis) - STARs. The
update the occurrence column of the Call Exploration area Eurographics Association, 2015.

114
A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS

[9] L. McNabb and R. S. Laramee, “Survey of surveys sos - mapping
the landscape of survey papers in information visualization,” Comput.
Graph. Forum, vol. 36, no. 3, pp. 589–617, Jun. 2017. [Online].
Available: https://doi.org/10.1111/cgf.13212
[10] R. Luh, S. Schrittwieser, and S. Marschalek, “LLR-based Sentiment
Analysis for Kernel Event Sequences.” IEEE, 2017.
[11] R. Luh, G. Schramm, M. Wagner, and S. Schrittwieser, “Sequitur-based
Inference and Analysis Framework for Malicious System Behavior,”
2017.
[12] O. Somarriba, U. Zurutuza, R. Uribeetxeberria, L. Delosières, and
S. Nadjm-Tehrani, “Detection and visualization of android malware
behavior,” vol. 2016, p. e8034967, 2016.
[13] S. Marschalek, R. Luh, M. Kaiser, and S. Schrittwieser, “Classifying
malicious system behavior using event propagation trees.” ACM Press,
2015, pp. 1–10.
[14] B. Xiaofang, C. Li, H. Weihua, and W. Qu, “Malware variant detection
using similarity search over content fingerprint.” IEEE, 2014, pp. 5334–
5339.
[15] A. Jain, H. Gonzalez, and N. Stakhanova, “Enriching reverse engineering
through visual exploration of android binaries,” in Proceedings of the 5th
Program Protection and Reverse Engineering Workshop, ser. PPREW-5.
ACM, 2015, pp. 9:1–9:9.
[16] O. E. David and N. S. Netanyahu, “DeepSign: Deep learning for
automatic malware signature generation and classification.” IEEE, 2015,
pp. 1–8.
[17] P. M. Wrench and B. V. W. Irwin, “Towards a PHP webshell taxonomy
using deobfuscation-assisted similarity analysis.” IEEE, 2015, pp. 1–8.
[18] A. Stevenson and J. R. Cordy, “A survey of grammatical inference in
software engineering,” Science of Computer Programming, vol. 96, pp.
444–459, 2014.
[19] L. Ming and P. Vitányi, An introduction to Kolmogorov complexity and
its applications. Springer Heidelberg, 1997.
[20] C. G. Nevill-Manning and I. H. Witten, “Identifying hierarchical struc-
ture in sequences: A linear-time algorithm,” J. Artif. Intell. Res. (JAIR),
vol. 7, pp. 67–82, 1997.
[21] T. Dunning, “Accurate methods for the statistics of surprise and coinci-
dence,” Computational linguistics, pp. 61–74, 1993.
[22] C. Ware, Information Visualization: Perception for Design. Elsevier,
2012.
[23] B. Shneiderman, “The eyes have it: a task by data type taxonomy for
information visualizations,” in Proc. of VL, 1996, pp. 336–343.
[24] R. Fabian, “Data-Oriented Design,” 2013, ac-
cessed on Nov. 11, 2015. [Online]. Available:
http://www.dataorienteddesign.com/dodmain/dodmain.html
[25] H. Dornhackl, K. Kadletz, R. Luh, and P. Tavolato, “Malicious behavior
patterns,” in SOSE. IEEE, 2014, pp. 384–389.
[26] J. Nielsen, Usability engineering. Boston: Academic Press, 1993.
[27] M. Wagner, A. Rind, G. Rottermanner, C. Niederer, and W. Aigner,
“Knowledge-assisted rule building for malware analysis,” in Proceedings
of the 10th Forschungsforum der österreichischen Fachhochschulen, FH
des BFI Wien. Vienna, Austria: FH des BFI Wien, 2016.

115