<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Bigram Supported Generic Knowledge-Assisted Malware Analysis System: BiG2-KAMAS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Niklas Th u¨r</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Wagner</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Schick</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christina Niederer</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J u¨rgen Eckel</string-name>
          <email>3eckel.j@ikarus.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Luh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Aigner</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Institute of Creative\Media/Technologies</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>St. P o¨lten University of Applied Sciences</string-name>
          <email>rst.last@fhstp.ac.at</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Austria</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IKARUS Security Software GmbH</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Josef Ressel Center for Unified Threat Intelligence on Targeted Attacks</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <fpage>107</fpage>
      <lpage>115</lpage>
      <abstract>
        <p>-Malicious software, short “malware”, refers to software programs that are designed to cause damage or to perform unwanted actions on the infected computer system. Behaviorbased analysis of malware typically utilizes tools that produce lengthy traces of observed events, which have to be analyzed manually or by means of individual scripts. Due to the growing amount of data extracted from malware samples, analysts are in need of an interactive tool that supports them in their exploration efforts. In this respect, the use of visual analytics methods and stored expert knowledge helps the user to speed up the exploration process and, furthermore, to improve the quality of the outcome. In this paper, the previously developed KAMAS prototype is extended with additional features such as the integration of a bi-gram based valuation approach to cover further malware analysts' needs. The result is a new prototype which was evaluated by two domain experts in a detailed user study.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>Malicious software, or short malware, is one of the biggest</title>
      <p>
        threats to computer systems these days [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. ’Malware’ refers
to software programs, which are designed to cause damage or
perform other unwanted actions on a computer or network.
Therefore malware plays a big part in most computer
intrusions and security incidents. Malware includes inter alia:
viruses, trojan horses, worms, rootkits, scareware, and
spyware [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. By now there are millions of malicious programs
and the number is increasing every day.
      </p>
      <p>
        “Malware analysis is the art of dissecting malware to
understand how it works, how to identify it, and how to
defeat or eliminate it” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In malware analysis, there are two
basic approaches to examine a malware program: the static
and the dynamic approach. Often the malware analyst only
has the potentially malicious executable, which includes the
machine code but is not human-readable. Therefore, static
malware analysis involves the investigation of the malware
executable as well as certain reverse-engineering tasks to
recover the sample’s source code. On the other hand, dynamic
analysis requires the execution of the malicious software on
e.g. a virtualized host machine to detect the malware’s
runtime behavior [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To cover all of the malware analyst’s
needs, Wagner et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] performed a problem characterization
and abstraction elaborating the analysts needs in relation to
behavior-based malware analysis. In the article by Wagner
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a design study for a behavior-based
knowledgeassisted malware analysis system (referred to as KAMAS)
is described. The malware analyst’s workflow involves the
tasks of examining potentially malicious behavior patterns,
selecting them, categorizing them, and storing the found rules
in the knowledge database (KDB) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We developed an
interactive prototype to extend the KAMAS design study [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
with a new feature of Bi-Gram supported Generic
KnowledgeAssisted Malware Analysis System (BiG2-KAMAS) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A
focus group meeting with members of an Austrian IT security
company, the Information security department of St. P o¨lten
UAS and the developers of the initial KAMAS prototype
was conducted to identify the tasks and needs for additional
features requested by the IT security company to extend the
KAMAS design study [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Based on this feature list, the paper
at hand contributes the following:
1) Integrating a generic data loading process enabling
KAMAS to load any kind of data, based on a given
structure;
2) Storing benign rules and their highlighting when loading
new cluster files, thereby supporting the analyst;
3) Identifying malicious or benign call sequences by
including a bi-gram based valuation;
4) Presenting in detail two user studies validating the new
features.
      </p>
      <p>This paper is structured as follows: Sect. II provides
background knowledge about the work of our collaborators and
related work in the field of malware analysis. In Sect. III we
describe the prototype’s design, visualization methods and
implementation. Furthermore, Sect. IV defines the integration of
additional knowledge in the prototype’s knowledge database.
Sect. V shows the prototype’s evaluation method, while results
are discussed in Sect. VI.</p>
    </sec>
    <sec id="sec-3">
      <title>II. RELATED WORK</title>
    </sec>
    <sec id="sec-4">
      <title>Shiravi et al. [5] published a survey related to network se</title>
      <p>
        curity visualization, comparing the data sources and
visualization techniques of thirty-eight different systems. Furthermore,
Egele et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] presented a general literature for malware
analysis techniques and tools. In their work they surveyed
different approaches for dynamic automated malware
analysis and compared them based on their analysis techniques.
Likewise, Bazrafshan et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] surveyed various heuristic
malware detection techniques as well as malware obfuscation
techniques. Additionally, Wagner et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] published a survey
of 25 different visualization systems for malware analysis. The
objective of their work was the comparison and categorization
of the malware systems visualization methods and features and
categorizing them along their novel ’Malware Visualization
Taxonomy’. Furthermore, McNabb and Laramee [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] published
a survey of surveys: Mapping The Landscape of Survey Papers
in Information Visualization.
      </p>
    </sec>
    <sec id="sec-5">
      <title>In 2017, Wagner et al. [3] published a paper on a</title>
      <p>
        Knowledge-Assisted Malware Analysis System, referred to as
KAMAS. In their user study, they found out that the experts
are not only interested in visualizing patterns. A supportive
valuation approach was implemented by Luh et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
calculating the degree of maliciousness based on system and
API call bi-grams. Somarriba et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] presented another
malware detector system for Android Malware Behavior.
Besides, Marschalek et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] published a system for threat
detection using a real-time monitoring agent to gather all or
only selected system events and visualize these using event
propagation trees. Xiaofang et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] published a paper of
a malware variant detection approach using Similarity Search
“by processing malware as content fingerprint” [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Jain et
al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] presented a visual exploration approach of android
binary files. Their approach is based on the visualization of
android .dex files to analyze and compare malicious android
executables. David et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] presented “a novel deep learning
based method for automatic malware signature generation
and classification” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Wrench and Irwin [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] published an
approach in which they identify and classify Remote Access
Trojans (RATs) and other malicious software based on the
programming language PHP.
      </p>
    </sec>
    <sec id="sec-6">
      <title>III. PROTOTYPE CONCEPT</title>
    </sec>
    <sec id="sec-7">
      <title>This section describes the new features of the ‘Bi-Gram</title>
      <p>
        supported Generic Knowledge-Assisted Malware Analysis
System (BiG2-KAMAS), conceptually grounded on the
KAMAS prototype [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <sec id="sec-7-1">
        <title>A. Data</title>
        <p>
          In its current iteration, BiG2-KAMAS bases its visualization
on sequential traces of Windows kernel operations amounting
to benign and malicious application behavior in the context
of OS and user-initiated processes. These events are typically
abstractions of raw system and API calls that yield information
about the general behavior of an unknown application
sample or resident process [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Raw calls may include wrapper
functions (e.g. CreateFile) that offer a simple interface
to the application programmer, or native system calls (e.g.
        </p>
        <p>
          NtCreateFile) that represent the underlying OS or kernel
support functions. In the context of BiG-KAMAS and its data
providers, events are collected directly from the Windows
kernel. We employ a driver-based monitoring agent [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
designed to collect and forward a number of events to a
database server. This gives us unimpeded access to events
depicting operations related to process and thread control,
image loads, file management, registry modification, network
socket interaction, and more. For example, a shell event that
creates a new binary file on a system may be simply denoted as
a triple explorer.exe,file-create,sample.exe.
        </p>
        <p>Additional information captured in the background includes
various process and thread ID information required to uniquely
identify an event within a system session and to link individual
events to a full sequence (trace) needed for further processing
stages. Based on aforementioned traces, BiG2-KAMAS uses
two distinct mechanisms to further process arbitrary kernel
event sequences:</p>
        <p>
          Pattern inference: Our introduced framework has been
developed in concert with an event extraction system called
SEQUIN [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. SEQUIN uses grammar inference extended
with statistical evaluation to automatically identify and crop
relevant sequences (rules) from traces of kernel-level
behavioral data for further processing and visualization. Generally
speaking, grammar inference is the process of computationally
assembling a formal ruleset by examining the sentences of an
unknown language [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In the information security domain,
grammar inference is primarily used for pattern recognition,
computational biology, natural language processing, language
design programming, data mining, and machine learning.
        </p>
        <p>
          Grammar inference has also been proven to be a feasible
approach to anomaly detection, since “algorithmic
incompressibility is a necessary and sufficient condition for randomness”
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. We use grammar inference as key component in the
process of ‘compressing’ a sequential trace for extracting
relevant behavioral patterns.
        </p>
        <p>
          To achieve inference by compression in a computationally
feasible way, we selected an algorithm that losslessly produces
(without changes to order and immutability) a context-free
grammar (CFG) in unsupervised operation. As opposed to
context-sensitive grammars, languages created by a CFG can
be recognized in O(n3) time, which is a relevant distinction for
all future parsing efforts. The choice ultimately fell on Sequitur
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Sequitur is a greedy compression algorithm that creates
a hierarchical structure (CFG) from a sequence of discrete
symbols by recursively replacing repeated phrases with a
grammatical rule. The output is a compressed representation of
the original sequence. The algorithm creates this representation
through the application of two base properties: rule utility and
bi-gram uniqueness. Rule utility checks if a rule occurs at least
twice in the grammar, while bi-gram uniqueness observes if
two adjacent symbols occur only once. Assuming we have
a string abcdbcabcd, where every character represents an
event, the first bi-gram of that trace would be ab, followed by
a second bi-gram bc, and so forth. See Table I for a complete
example of the process.
        </p>
        <p>
          Sequitur is linear in space and time. In terms of data
compression, the algorithm can outperform other designs
that achieve data reduction by factoring out repetition. It is
almost as performant as designs that compress data based on
probabilistic predictions [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>
          Bi-gram extraction and scoring: In addition to rule
inference, BiG2-KAMAS uses precomputed maliciousness scores
of event bi-grams separately explored using a sentiment-like
extraction system based on the log likelihood ratio (LLR) test
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. An LLR test is a statistical method used test model
assumptions, namely the quality of fit of a reference (null)
and an alternative model. When determining the occurrence
of rarely observed events – which are often at the core of
malicious traces – likelihood ratio tests show significantly
better results than alternatives such as x2 or z-score tests [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>In preparation for sentiment-assisted visualization, we use
the LLR method to learn likely benign and malicious event
sequences in big corpora of recorded kernel operations (traces).</p>
        <p>
          The resulting sentiment dictionary can be used to accurately
and effectively determine if an investigated event bi-gram is
contextually suspicious. Specifically, we compute the LLR
score for each bi-gram to highlight collocations characteristic
to sequences of malicious and benign system events [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          The resulting occurrence counts (shown in Table II) are
the basis for this calculation: Following the approach by
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], we define the number of times both event tokens occur
in combination (k11), the number of times each token has
been observed independently from the other (k12 and k21,
depending on the relative position in the bi-gram), and the
number of times the token was not present at all (k22).
The same process is later applied to the pattern’s general The background of the third column of the ‘Rule Overview
occurrence in a labeled benign versus malicious corpus. The Table’ indicates whether a rule is fully benign, partially
final result is a normalized sentiment rating ranging from benign, not known, partially malicious or fully malicious.
+1.0 (benign) to −1.0 (malicious). Unknown bi-grams are The background of the malicious rules will be painted in red
ultimately scored against the resulting dictionary, the outcome and the background of the benign rules in blue. The fully
of which is at the core of the bi-gram evaluation feature in the known rules will be displayed in a dark red/blue while the
new BiG2-KAMAS prototype. partially known rules are highlighted in a light red/blue (see
Figure 1:1b). The red color highlighting for malicious activity
B. Visualization Design is adopted of the KAMAS prototype [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. If a rule is fully
        </p>
        <p>
          Structure: Wagner et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] describe in their article that known and, therefore, highlighted in dark red, the rule is
since IT-security experts are commonly familiar with pro- included as-is in the KDB. A partially known rule is only a
gramming IDEs, they used the design concept of IDEs like part of one rule in the KDB. This kind of rule has at least one
Eclipse or Netbeans for their prototype. The updates to the new additional call at the beginning or at the end of a fully known
prototype also follow this design concept approach. In contrast rule [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. If an input file was loaded, the system automatically
to the previous prototype, the new one has an additional view. calculates the knowledge state of each rule. For this purpose,
In this initial view the KDB is situated on the left side, which the system compares each rule of the input file with each
can be compared to the project view in Eclipse. On the right rule of the KDB. After the calculation process the system
side only the file load buttons are displayed, which can be highlights the rules in the corresponding colors in the rule
compared to the initial view of Eclipse, where no project has overview table.
been opened yet. Bi-Gram Visualization: The rule detail table is located
        </p>
        <p>Coloring: For the rule highlighting as well as the Bi-Gram next to the rule overview table (see Figure 1:2b). The rule
visualization we selected a sequential color scheme from red detail table automatically updates its content when clicking
to blue. Red indicates that the rule or bi-gram is malicious on a rule in the rule overview table and represents all system
and a blue one stands for a benign rule or bi-gram. To avoid and API calls included in the selected rule. From left to right,
problems with red and green hues for colorblind people [22, p. the table displays the unique id as well as the name of the call.
124], we used blue instead of green and select colorblind-safe The last column visualizes the new bi-gram based valuation
qualitative colors from Colorbrewer1. approach for the corresponding calls. As mentioned before,</p>
        <p>
          Layout: The prototype is structured into three parts: knowl- the prototype uses the bi-gram approach of Luh et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
edge base, rule exploration area and call exploration area (see A bi-gram is an n-gram where the length of n = 2. An
Figure 1). On the left side the knowledge base is visualized n-gram, in turn, is a coherent sequence of n elements. In
with it’s ‘Knowledge Database (KDB)’ (see Figure 1:1a) and this approach the elements are system or API calls. Each
the KDB’s color highlighting filters (see Figure 1:1b). The bi-gram has a score in the range [
          <xref ref-type="bibr" rid="ref1">-1, 1</xref>
          ], which indicates
KDB is displayed as a tree, in which each category of the whether this pair of calls is malicious or benign. For
bidatabase can have several subcategories. Each category with gram based valuation, two different visualization approaches
subcategories is shown with a box icon (see Figure 1:1a) were implemented following a semantic zooming approach:
and the ones without subcategories are displayed with folder First, if the width of the bi-gram column is bigger than 75px,
icons. Each rule, which is stored in the database, is displayed the prototype visualizes the bi-gram values as bar charts (see
with a paper icon. Beneath the KDB the ‘Knowledge Base Figure 2:a), whereby each bar starts in the middle of the
biHighlighting’ filters are displayed (see Figure 1:1b). Each filter gram column. If the bi-gram score is between 0 and -1, the
can be activated or disabled with its checkbox and updates the bi-gram is malicious. Therefore, the red color bar chart unfurls
result of the prototypes filter pipeline and visualization of the from the middle towards the left side. If the bi-gram score is
‘Rule Overview Table’ (see Figure 1:2a). between 0 and 1 the bi-gram is benign and the bar chart is
        </p>
        <p>After loading and translating the input file, the system visualized from the middle to the right side in a blue color. The
updates the ‘Graphical User Interface’ (GUI) and visualizes colors correspond to the KDB highlighting. The visualization
new elements. In the middle the ‘Rule Exploration’ area (see approach was chosen to give the user a quick but still precise
Figure 1:2) is visualized, while the right side contains the ‘Call overview of the bi-gram based scores.
Exploration’ area (see Figure 1:3). If the width of the bi-gram column is smaller than 75px and</p>
        <p>
          In the ‘Call Exploration’ area all the included system or API therefore the bar charts are hardly recognizable, the system
calls of the loaded input file are represented in the call table switches to the second visualization. Here, the bi-gram values
(see Figure 1:2b) as described by Wagner et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The rules are visualized as a color-filled rectangle (see Figure 2:b).
included in the input file are visualized in the rule overview As before, a red colored rectangle indicates that the bi-gram
table located in the ‘Rule Exploration’ area (see Figure 1:2a). is malicious and a blue one stands for a benign bi-gram.
If the user loads several trace files, each trace file will be To visualize the value of the malicious or benign bi-gram,
displayed as one rule. the system changes the alpha value of the displayed color.
        </p>
        <p>Therefore, the darker the color, the higher the respective value.
1http://colorbrewer2.org Since the difference of an alpha value between 255 and 240 is
data of these files. Contrary to a loaded Sequitur file, each
entry of the rule overview table represents an entire trace file.</p>
        <p>Thus, if the user loads three traces the rule overview table will
have only three rows. Furthermore, due to the fact that the user
analyses several independent trace files the histogram for the
rule occurrence is insignificant. Therefore, only one histogram
for the trace length will be displayed in the rule filter area.</p>
        <p>
          Rearrange: If the rule overview table and the call overview
table are loaded with data, the user can rearrange their content
by clicking on a table’s column. This will re-sort the included
data and update the visualization [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The content of the rule
detail table cannot be rearranged since the calls are shown in
their sequential order and should therefore not be changeable.
        </p>
        <p>
          Fig. 2. The two different visualisations methods of the call bi-grams. The Filter: In the next step the user can reduce the number of
first method visualises the bi-grams as bar charts (a), whereas the second rules or trace files by using the rule/trace and call filters [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
visualisation uses the alpha channel to show the severity of the bi-gram (b). No matter which files were loaded, the user always has the
opportunity to filter the rules or traces by the included calls
not easy to recognize and every value below 100 is generally (events). The user can rearrange the call filters or select a
difficult to see, we decided to implement only four graduation specific call in the call overview table to reduce the number of
steps for the alpha value. The visualization with the alpha shown rules [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Furthermore, the analyst can filter the rules or
value is less precise than the visualization with the bar charts specific traces by using the filters in the rule exploration area.
but, at the same time, significantly easier to interpret. Table III If loading a Sequitur file, the analyst can filter the rules by their
shows the different graduation steps and their value ranges. occurrence, length, whether they are equally distributed in the
input file or if they match, partially match, or don’t match the
        </p>
        <p>
          TABLE III stored rules in the KDB [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. By changing the filter settings, the
COLOUR GRADUATION STEPS FOR THE ALPHA VALUE BI-GRAM included rules in the rule overview table automatically update
        </p>
        <p>VISUALISATION. immediately. If one or more trace files were loaded, the analyst
Colour Alpha value Value ranges can only filter the shown traces in the rule overview table by
200 &gt;= 0.75 their length. In addition, the highlighting and filtering of the</p>
        <p>KDB is switched off.
150 &gt;= 0.5 &amp;&amp; &lt;0.75 Details-on-Demand: If the user wants to analyze a rule or
100 &gt;= 0.25 &amp;&amp; &lt;0.5 trace, he/she can open the rule/trace in the rule detail table
50 &gt;= 0 &amp;&amp; &lt;0.25 by selecting it in the rule overview table. This will display</p>
        <p>
          all the included calls in the rule detail table in their sequential
50 &lt;0 &amp;&amp; &lt;= -0.25 order [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The bi-grams provide information whether a
combi100 &lt;-0.25 &amp;&amp; &lt;= -0.5 nation of two calls is malicious or benign. This should support
150 &lt;-0.5 &amp;&amp; &lt;= -0.75 the user in finding interesting call sequences more quickly.
200 &lt;-0.75 Extract: Independent of the loaded files the analyst can
add a new rule to the database using two different ways.
        </p>
        <p>One method is to simply select one rule or trace in the rule
C. Interaction overview table and simply drag and drop it in one leaf category</p>
        <p>
          Like the KAMAS prototype of Wagner et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], the BiG2- of the KDB. This will add the entire rule or trace file to the
KAMAS’s functionality will be described in accordance to database [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Alternatively, the analyst can select several calls
the four steps of the visual information seeking mantra of of interest in the call overview table and add these by dragging
Shneiderman et al. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], namely overview, rearrange and filter, and dropping them to the KDB. When adding a new rule to
details-on-demand and, extract. the KDB, a popup window will show up where the analyst
        </p>
        <p>
          Overview: The BiG2-Kamas prototype has an additional can assign the rule a specific name. If the user has loaded a
initial view where the user can decide whether to load a Sequitur file, the system will now update the knowledge state
Sequitur input file or several raw trace files. When the analyst for all rules as well as the highlighting in the rule overview
loads a Sequitur file, the rule and call tables will be filled with table for further analysis.
the rule and call data included in the input file. Each entry
in the rule overview table represents one rule of the loaded D. Implementation
cluster. Furthermore, the histograms in the rule exploration Since the BiG2-KAMAS prototype is based on the
protoarea give a quick impression of the distribution in the rule type of Wagner et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], it also uses a data-oriented design
occurrence and length [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. When the user loads one or more concept [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. To increase the performance of the prototype,
trace files the rule and call tables will also be filled with the the system only works with integer comparisons. Therefore,
the input data only includes the call ids. It is only possible to • Rule Name: Here, the actual rule name is displayed. The
translate a call id to the actual call value with an additional rule name is implemented as a text field to quickly change
translation file. This translation file is also used for the bi- it if necessary.
grams. The original bi-gram file has several columns in which • Included Calls: Finally, the calls included in the stored
only the string values of the system or API calls are stored. rule are displayed in a table. Thus, the calls are visualized
To increase the performance and to reduce memory usage, the in their sequential order and each call will be shown with
BiG2-KAMAS prototype generates its own bi-gram file. When its unique call id which corresponds to the call id of the
starting the prototype the system checks with md5 hash values translation file and the actual call value. In the current
to determine whether the translation file or the original bi-gram version of the prototype it is only possible to investigate
file has changed. If so, the system converts the original bi-gram the included calls in their sequential order, but not to
file to the translated bi-gram file in which also the integer delete specific calls which are listed in the table.
values of the system calls are stored. Like the prototype of The second menu item is the “delete’ item, which allows
Wagner et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] the new prototype is using the action pipeline the analyst to delete the currently selected rule. Furthermore,
for filter options. This enables dynamic query environments when selecting a concept instead of a rule, the BiG2-Kamas
and real-time data operations. prototype will show a context menu with which the user
        </p>
        <p>To evaluate the robustness and performance of the BiG2- can disable a category and all its integrated subcategories.
KAMAS prototype three different Sequitur cluster-grammar Thus, the analyst can disable the entire KDB or only specific
files containing between 10 and 500 rules were used. The file categories. If the user disables a category all the included
with 500 different rules contained a total amount of 30,000 rules will no longer be considered in the knowledge base
system and API calls. To test the bi-gram functionality, a bi- highlighting and filtering.
gram file with nearly 117,500 bi-gram entries was loaded. On When the user clicks the right mouse button to open
a machine with an 2.1GHZ Dual-Core processor and 12GB the corresponding context menu before selecting a rule or
of memory it took the system about four minutes to translate category, the system automatically selects the rule/category at
the original bi-gram file to the translated bi-gram file. The the actual mouse position.
malware and bi-gram samples were collected by collaborators Searching: If the user searches for interesting rules or
in the Josef Ressel Center TARGET of St. P o¨lten UAS. specific calls or call groups he/she can use the call filter options
to reduce the data to be analyzed. In the call exploration area,
IV. EXTERNALIZED KNOWLEDGE INTEGRATION the user can search for a specific call by entering its name or
use regular expressions to find an entire call group. Beneath
the search text field the user can enable case sensitive search
with the corresponding checkbox ’Case Sensitive’. Filtering or
searching the calls affects the data shown in the call overview
and rule overview table. Additionally, to find rules of interest
the analyst can use the rule exploration filters or the knowledge
base filters.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>As Wagner et al., [3] described in their article, we integrated</title>
      <p>
        a knowledge database to support the user during their analysis
tasks. The KDB is based on the malware behavior schema of
Dornhackl et al., [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The KDB is located at the left side of
the prototype and is implemented in a hierarchical structure
(tree structure). In the BiG2-KAMAS prototype the KDB was
extended by one additional category to store the benign rule
data, namely benign activity. In the current version of the
prototype there is only one category to store benign rule data.
Each category is displayed with either a box or a folder icon,
the category description and the number of included rules in
the integrated subfolders. The analyst can add new rules by
drag &amp; drop. When adding a new rule, the KDB automatically
unfolds closed categories. Additionally, a popup window opens
in which the analyst can enter a rule name. To investigate a
rule stored in the KDB, the user can open a context menu by
right clicking on the chosen rule. The context menu will show
two different menu items, namely ‘Information’ and ‘Delete’.
The information menu item opens a popup window in which
the analyst is presented the following information:
• Assigned Concept: This information tells the analyst in
which schema category (concept) the rule is currently
categorized. The assigned concept is implemented as a
selection list to give the user the opportunity to change
the assigned concept. For that purpose, the analyst must
select a different concept in the list and press the save
button at the bottom of the pop up window.
      </p>
    </sec>
    <sec id="sec-9">
      <title>V. PROTOTYPE EVALUATION</title>
    </sec>
    <sec id="sec-10">
      <title>This section describes the procedure of the performed user</title>
      <p>studies, the specific results, as well as further feature requests.
For the prototype validation, a user study with two domain
experts was conducted. The domain experts validated the
functionality as well as the visual design interface.</p>
      <p>Participants: Both participants work at St. P o¨lten UAS and
have more than five years of experience in the field of malware
analysis. The first participant is between 30 and 39 years of
age, male and holds a masters degree. The second participant
is between 60 and 69 years of age, male, and holds a PhD.
Generally, both participants are well experienced in this field
and can be categorized as experts.</p>
      <p>Design and Procedure: Each participant was interviewed
individually and had already tested the previous version of
the prototype at least once. First, the participants received a
short introduction to the new features of BiG2-KAMAS and
also a quick reminder of the basic features and workflow.
The participants were asked to mention additional missing
functionalities and to criticize all potential usability issues.
a specific call in a group of similar calls. Additionally, he
recommended a search button for the regular expression call
filter. This could help some users, since currently it is only
possible to search by pressing the enter key. Adding a new
rule to the KDB was no challenge for either participant and
both valued the ability to give the rule a specific name.</p>
      <sec id="sec-10-1">
        <title>Scenario 2: Loading and analyzing three trace files.</title>
        <p>Both participants had no difficulties with loading the three
trace files. They also recognized quickly that each entry in the
rule overview table now represents one trace. Neither of them
realized that the knowledge base filters and highlighting were
disabled. Participant 1 suggested to gray out the knowledge
base filters to make it clear that these are disabled. Participant
2 proposed to change the headings for the trace file analysis
view in order to avoid confusion. He remarked that it could
be misleading if the headings say e.g. ‘Rule Overview Table’
when analyzing a trace file. Furthermore, both participants
recommended to change the occurrence column in the rule
overview table to the file names of the traces. As the last task,
the participants had to change the corresponding category of a
random rule. Even if both participants solved this task easily,
both remarked that it would be useful if the user could move
a rule from one category to another per drag &amp; drop.</p>
      </sec>
      <sec id="sec-10-2">
        <title>B. Result Analysis</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>This section gives an overview of the issues which were</title>
      <p>
        mentioned during the expert reviews. Like Wagner et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
each issue was rated based on Nielsen’s [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] severity ratings.
Table IV shows the potential new features noted by the test
persons and includes three columns: ‘feature requests’ (FR),
‘severities’ (SE) and the effort it would take to implement
these changes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The features mentioned in the table include
small cosmetic changes as well as real usability improvements.
The only feature mentioned by all participants is an additional
tooltip which shows the actual bi-gram values.
Both participants took part in the same two scenarios: First, the
participants had to load a Sequitur file, investigate the loaded
rules and filter specific call sequences. At the end they had to
store a rule in the KDB and name it. In the second scenario,
the participants had to load three trace files. They were asked if
they perceived any differences when loading trace files instead
of a Sequitur file. At the end they had to investigate a rule
stored in the KDB and move it to a different category.
      </p>
      <p>Equipment and Materials: The latest version of the
BiG2KAMAS prototype was used in the evaluation. For the first
user scenario, the participants had to load a Sequitur file with
about 500 rules and 30,000 system and API calls. In the second
scenario, three trace files with a length between ten and fifteen
calls were used. The bi-gram file had a total number of about
117,000 bi-grams. The translated bi-gram file had already
been generated so that the participants did not have to wait
until the system finished the translation process. As evaluation
equipment, two different setups were used. Both participants
worked on a 13 inch Macbook Pro with a Retina display
(screen resolution of 2560x1600) and a mouse for navigation.
Participant #1 worked with an additional 20 inch Monitor with
a full HD screen resolution and an external keyboard. Each
user test was conducted with the same version of the
BiG2KAMAS prototype and was documented on paper.</p>
      <sec id="sec-11-1">
        <title>A. Results</title>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>The following section discusses the results of both scenarios. Both the results of ‘Scenario 1’ (Sequitur file) and ‘Scenario 2’ (trace files) will be presented. Both participants had no problem loading the different files for the user scenarios.</title>
      <sec id="sec-12-1">
        <title>Scenario 1: Loading and Analyzing a Sequitur file.</title>
        <p>Both participants quickly recognized the additional color
scheme for the new benign category. The colors for the
knowledge base highlighting were assessed as easily understandable
and the additional rule counter next to the knowledge base
filters were mentioned as being very useful. Participant 1
mentioned that if a rule in the rule overview table is highlighted,
it would be useful to know which rule or rules of the KDB
match this rule in the table. Therefore, a tooltip would be
helpful which tells the user the names of the matching rules
of the KDB. Furthermore, participant 2 suggested to always
show the rule counter of the KDB’s categories. If there are
currently no rules in a category, the counter should be zero.</p>
        <p>When participant 2 first saw the bar chart bi-gram
visualization, he assumed it visualizes the occurrence of the combined
call sequence. In contrast, the alpha color visualization was
immediately recognized as an indicator for maliciousness or
benignity. Participant 1 also mentioned that the alpha color
visualization is easier and faster to recognize. Furthermore,
both participants mentioned that the color visualization is not
as precise as the bar chart visualization and therefore would
only be useful for initial malware classification. Participant 1
suggested an additional tooltip to display the accurate bi-gram
value. Participant 2 remarked that it would be more useful if
the calls in the call overview table only showed the beginning
and the end of the call’s value. This would simplify finding</p>
        <p>Description
KDB: Move a rule to another category by using
drag &amp; drop.</p>
        <p>KDB: Show the rule counter even if zero rules
are included.</p>
        <p>KDB: Gray out the knowledge base filters if they
are disabled.</p>
        <p>Tables: Highlighted rules in the rule overview
table should show the KDB’s corresponding rules.</p>
        <p>Tables: Change the occurrence column to the
trace file names.</p>
        <p>Tables: Show only the begin and the end of the
calls in the call overview table.</p>
        <p>Tables: Implement a search button for the call
regex search.</p>
        <p>Bigram: Tooltip to show the bi-gram values.</p>
        <p>Headings: Change the headings when loading
trace files.
2
1
2
3
2
3
1</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>The performed user studies described in Section V con</title>
      <p>firmed that the four feature requests, which are determined in
Section I are fulfilled by the BiG2-KAMAS prototype:
1) Generic data loading: The BiG2-KAMAS prototype is
structured to enable the generic loading of data sequences. To
make this possible the input data as well as the prototype’s
database are based on unique identifiers (id) instead of the
actual values. Thus, all system-internal comparisons are based
on integer values instead of string values. Only with the
corresponding translation table, the system can translate the
ids to the actual values. Thus, it is possible to load data
sequences independent of their actual values as long as there
is a translation table through which the prototype can translate
the data. Furthermore, the system was adopted to also offer
the opportunity to load raw system or API call based traces.
In this state the KDB highlighting and filtering is disabled
but the user can explore the loaded trace files and add new
rules to the KDB. The prototype can’t only load Sequitur call
sequences, but also independent data sequences as long as the
the data sequence has the given structure and a translation file.</p>
      <p>2) Extend the KDB with benign rules: To fulfill this
requirement the KDB was extended with an additional category for
benign activity. In this category, all rules which are identified
as benign can be stored. Additionally, the KDB’s highlighting
and filter pipelines were extended to identify and filter partially
and fully benign rules. Rules with a partially or fully benign
knowledge state are highlighted in blue in order to avoid the
combination of the colors red and green.</p>
      <sec id="sec-13-1">
        <title>3) Implementation of bi-gram based valuation: To support</title>
        <p>
          the bi-gram approach of Luh et al, [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] the prototype’s
rule detail table was adopted. Since many domain experts
mentioned [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] that the arc-diagram visualization is not very
helpful, it was replaced by the bi-gram visualization. Bi-gram
based valuation is implemented with two different approaches.
If the width of the bi-gram column is bigger than 75px the
valuation is visualized with bar charts and colored in red
(malicious) or blue (benign). If the width is less than 75px
the bi-gram visualization uses the alpha channel to show the
severity of the bi-gram (see Table III).
        </p>
      </sec>
      <sec id="sec-13-2">
        <title>4) User studies to validate the new features: The results of</title>
        <p>the user studies show further feature requests which could be
implemented in a future project. However, both participants
mentioned that the bi-gram visualization is very helpful for
identifying potentially malicious or benign call sequences and,
therefore, helps to decide whether a rule is malicious or not.</p>
        <p>
          Future Work: For the behavior-based malware analysis
process, it could be valuable to implement a rule creation
process where the analyst can build their own rules based on
the known system and API calls [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Furthermore, it could be
beneficial to edit the stored rules in the KDB or to build new
rules based on existing patterns. Further avenues for future
work are to include possibilities to hide, shrink an expand
areas to provide the user with more flexibility. Moreover, to
update the occurrence column of the Call Exploration area
(see 1:3a) to show the relation to the total number of
occurrences included in the loaded file. Additionally, normalizing
the occurrence dataset and visualization to this total could be
beneficial.
        </p>
        <p>
          Categorization of BiG2-KAMAS: Like the KAMAS
prototype [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] the BiG2-KAMAS prototype can be categorized
as a Malware Forensic as well as a Malware Classification
tool in the Malware Visualization Taxonomy of Wagner et
al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. However, due to the bi-gram based valuation the
BiG2KAMAS prototype offers the malware analyst an additional
assistance for the Individual Malware Analysis.
        </p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>VII. CONCLUSION</title>
      <p>
        In this work, we presented a design study for a Bi-gram
Supported Generic Knowledge-Assisted Malware Analysis System
(BiG2-KAMAS). The prototype is based on the KAMAS
prototype [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and extended by additional features such as
generic data loading, an extension of the KDB to enable the
analysis of benign rules, and the implementation of a bi-gram
based valuation approach. The requirements were discussed
in a focus group meeting and then implemented as part of
a functional prototype. After implementing the new features,
two user studies were conducted to evaluate the design and
the functionality of the new BiG2-KAMAS prototype.
      </p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGMENTS</title>
    </sec>
    <sec id="sec-16">
      <title>The financial support by the Austrian Federal Ministry of Science, Research and Economy and the National Foundation for Research, Technology and Development is gratefully acknowledged.</title>
      <p>This work was supported by the Austrian Science Fund
(FWF) via the “KAVA-Time” project (P25489-N23) and by the
Austrian Federal Ministry of Science, Research and Economy
under the FFG Innovationscheck (no. 856429). We would also
like to thank all focus group members and test participants who
have agreed to volunteer in this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sikorski</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Honig</surname>
          </string-name>
          ,
          <article-title>Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software</article-title>
          , 1st ed. No Starch Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Aigner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dornhackl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kadletz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tavolato</surname>
          </string-name>
          , “
          <article-title>Problem characterization and abstraction for visual analytics in behavior-based malware pattern analysis</article-title>
          ,
          <source>” in Proceedings of the Eleventh Workshop on Visualization for Cyber Security</source>
          , ser.
          <source>VizSec '14. ACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rind</surname>
          </string-name>
          , N. Thu¨r, and W. Aigner, “
          <article-title>A knowledge-assisted visual malware analysis system: Design, validation, and reflection of KAMAS,”</article-title>
          <source>Computers &amp; Security</source>
          , vol.
          <volume>67</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>N. Thu¨</surname>
          </string-name>
          r, M. Wagner,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Niederer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eckel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          , and W. Aigner, “
          <article-title>Big2-kamas: Supporting knowledge-assisted malware analysis with bi-gram based valuation</article-title>
          ,
          <source>” in Poster of the 14th Workshop on Visualization for Cyber Security (VizSec)</source>
          , Phoenix, Arizona, USA,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shiravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shiravi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          , “
          <article-title>A survey of visualization systems for network security</article-title>
          ,” vol.
          <volume>18</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>1313</fpage>
          -
          <lpage>1329</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Egele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Scholte</surname>
          </string-name>
          , E. Kirda, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kruegel</surname>
          </string-name>
          , “
          <article-title>A survey on automated dynamic malware-analysis techniques and tools</article-title>
          ,” vol.
          <volume>44</volume>
          , no.
          <issue>2</issue>
          , pp.
          <volume>6</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          :
          <fpage>42</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bazrafshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hashemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fard</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamzeh</surname>
          </string-name>
          , “
          <article-title>A survey on heuristic malware detection techniques</article-title>
          ,”
          <year>2013</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haberson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Keim</surname>
          </string-name>
          , and W. Aigner, “
          <article-title>A survey of visualization systems for malware analysis</article-title>
          ,” in Eurographics Conference on
          <string-name>
            <surname>Visualization (EuroVis) - STARs. The Eurographics Association</surname>
          </string-name>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>McNabb</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Laramee</surname>
          </string-name>
          , “
          <article-title>Survey of surveys sos - mapping the landscape of survey papers in information visualization</article-title>
          ,
          <source>” Comput. Graph. Forum</source>
          , vol.
          <volume>36</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>589</fpage>
          -
          <lpage>617</lpage>
          , Jun.
          <year>2017</year>
          . [Online]. Available: https://doi.org/10.1111/cgf.13212
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Marschalek</surname>
          </string-name>
          , “
          <article-title>LLR-based Sentiment Analysis for Kernel Event Sequences</article-title>
          .” IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          , G. Schramm,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , “
          <article-title>Sequitur-based Inference and Analysis Framework for Malicious System Behavior</article-title>
          ,”
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
            <surname>Somarriba</surname>
          </string-name>
          , U. Zurutuza,
          <string-name>
            <given-names>R.</given-names>
            <surname>Uribeetxeberria</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Delosie`res, and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Nadjm-Tehrani</surname>
          </string-name>
          , “
          <article-title>Detection and visualization of android malware behavior</article-title>
          ,” vol.
          <year>2016</year>
          , p.
          <fpage>e8034967</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marschalek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , “
          <article-title>Classifying malicious system behavior using event propagation trees</article-title>
          .
          <source>” ACM Press</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiaofang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Weihua</surname>
          </string-name>
          , and W. Qu, “
          <article-title>Malware variant detection using similarity search over content fingerprint</article-title>
          .
          <source>” IEEE</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>5334</fpage>
          -
          <lpage>5339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Stakhanova</surname>
          </string-name>
          , “
          <article-title>Enriching reverse engineering through visual exploration of android binaries,” in Proceedings of the 5th Program Protection</article-title>
          and Reverse Engineering Workshop, ser.
          <source>PPREW-5. ACM</source>
          ,
          <year>2015</year>
          , pp.
          <volume>9</volume>
          :
          <fpage>1</fpage>
          -
          <issue>9</issue>
          :
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O. E. David and N. S.</given-names>
            <surname>Netanyahu</surname>
          </string-name>
          , “
          <article-title>DeepSign: Deep learning for automatic malware signature generation and classification</article-title>
          .
          <source>” IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Wrench</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. V. W.</given-names>
            <surname>Irwin</surname>
          </string-name>
          , “
          <article-title>Towards a PHP webshell taxonomy using deobfuscation-assisted similarity analysis</article-title>
          .
          <source>” IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Cordy</surname>
          </string-name>
          , “
          <article-title>A survey of grammatical inference in software engineering</article-title>
          ,”
          <source>Science of Computer Programming</source>
          , vol.
          <volume>96</volume>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>459</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ming</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Vita</surname>
          </string-name>
          <article-title>´nyi, An introduction to Kolmogorov complexity and its applications</article-title>
          . Springer Heidelberg,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Nevill-Manning</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          , “
          <article-title>Identifying hierarchical structure in sequences: A linear-time algorithm,”</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Artif</surname>
          </string-name>
          .
          <source>Intell. Res. (JAIR)</source>
          ,
          <source>vol. 7</source>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>82</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dunning</surname>
          </string-name>
          , “
          <article-title>Accurate methods for the statistics of surprise and coincidence,” Computational linguistics</article-title>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>74</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ware</surname>
          </string-name>
          , Information Visualization:
          <article-title>Perception for Design</article-title>
          .
          <source>Elsevier</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          , “
          <article-title>The eyes have it: a task by data type taxonomy for information visualizations</article-title>
          ,”
          <source>in Proc. of VL</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fabian</surname>
          </string-name>
          , “
          <string-name>
            <surname>Data-Oriented</surname>
            <given-names>Design</given-names>
          </string-name>
          ,”
          <year>2013</year>
          , accessed on Nov.
          <volume>11</volume>
          ,
          <year>2015</year>
          . [Online]. Available: http://www.dataorienteddesign.com/dodmain/dodmain.html
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dornhackl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kadletz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tavolato</surname>
          </string-name>
          , “
          <article-title>Malicious behavior patterns,” in SOSE</article-title>
          . IEEE,
          <year>2014</year>
          , pp.
          <fpage>384</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          , Usability engineering. Boston: Academic Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rind</surname>
          </string-name>
          , G. Rottermanner,
          <string-name>
            <given-names>C.</given-names>
            <surname>Niederer</surname>
          </string-name>
          , and W. Aigner, “
          <article-title>Knowledge-assisted rule building for malware analysis,” in Proceedings of the 10th Forschungsforum der o¨sterreichischen Fachhochschulen, FH des BFI Wien</article-title>
          . Vienna, Austria:
          <source>FH des BFI Wien</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>