<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preventing malicious attacks by diversifying Linux shell commands?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joni Uitto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sampsa Rauti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jari-Matti Makela</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ville Leppanen</string-name>
          <email>ville.leppaneng@utu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turku</institution>
          ,
          <addr-line>20014 Turku</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>206</fpage>
      <lpage>220</lpage>
      <abstract>
        <p>In instruction set diversi cation, a "language" used in a system is uniquely diversi ed in order to protect software against malicious attacks. In this paper, we apply diversi cation to Linux shell commands in order to prevent malware from taking advantage of the functionality they provide. When the Linux shell commands are diversi ed, malware no longer knows the correct commands and cannot use the shell to achieve its goals. We demonstrate this by using Shellshock as an example. This paper presents a scheme that diversi es the commands of Bash, the most widely used Linux shell and all the scripts in the system. The feasibility of our scheme is tested with a proof-of-concept implementation. We also present a study on the extent of changes required to make all the trusted scripts and applications in the system use the new diversi ed shell commands.</p>
      </abstract>
      <kwd-group>
        <kwd>software security</kwd>
        <kwd>instruction set diversi cation</kwd>
        <kwd>Linux command shell</kwd>
        <kwd>Bash</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this paper, we present a diversi cation scheme which prevents the execution of
undiversi ed command shell scripts in order to protect the system from malware.
While the focus of our discussion is on injection attacks such as Shellshock, our
scheme also generally prevents attacks in many situations where the attacker
tries to execute a malicious script in the target system.</p>
      <p>
        Among security bugs, vulnerabilities allowing code injection attacks are
probably most commonly exploited by malware [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In these attacks, the code is
inserted in the vulnerable program, enabling the attacker to use the program's
privileges to launch an attack. Code injection attacks are known to be popular
with binaries compiled from weakly typed languages like C, but are also often
used to execute arbitrary code on other environments like SQL [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or Unix
command shell [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In this paper, we concentrate on preventing attacks in command
shell environment.
      </p>
      <p>Malware and attackers make use of the fact that the set of commands
interpreted by the command shell is identical on each computer. Because of this
? The authors gratefully acknowledge Tekes { the Finnish Funding Agency for
Innovation, DIGILE Oy and Cyber Trust research program for their support.
software monoculture, an adversary can design a single program that is able to
successfully attack millions of vulnerable computers and devices. To defeat these
kinds of attacks, we employ a method based on instruction set diversi cation.</p>
      <p>The command sets of command shells on di erent computers, servers and
devices can be uniquely diversi ed so that a piece of malware no longer knows
the correct shell commands to perform a speci c operation in order to access
resources on a computer. As the malware is unfamiliar with the language used
by the command shell, attempts to attack are rendered useless. Even if a piece of
malware were to nd out the secret diversi ed commands for one shell script, the
same secret commands do not work for other scripts or systems. This diversi
cation scheme can also be seen as proactive countermeasure against code injection
attacks: The exact type of injection does not have to be known beforehand in
order to thwart it.</p>
      <p>It is also worth noting that diversi cation does not a ect the software
development process. The general idea in diversi cation (be it targeted at a command
language or an API interface) is that software development is done against the
ordinary reference language or API interface, and software artefacts are
diversied machine-wisely after the development phase.</p>
      <p>
        The contributions of this paper are as follows. We propose a scheme for
diversifying Unix shell commands. Portokalidis et al. have brie y mentioned this
idea in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] among other possible applications of instruction set diversi cation.
However, they do not go into much detail or provide any implementation for a
diversi ed command shell. Our work can be seen as a continuation of this work,
taking a more detailed and concrete approach on this issue. Our approach also
signi cantly improves the security of their previous idea.
      </p>
      <p>We present a proof-of-concept implementation of a diversi ed command shell,
Bash, in order to demonstrate the feasibility of our approach in practice. We also
show how our solution prevents code injection attacks, using di erent popular
cases of Shellshock attack as examples. Additionally, we provide a brief study on
the extent of changes required to make all the script les in two real life Linux
distributions use the new diversi ed shell commands.</p>
      <p>The rest of the paper is structured as follows. Section 2 describes the attack
scenario. As an example, we describe Shellshock, an attack exploiting
vulnerabilities in the Bash command shell and explain how our approach prevents this
threat. In Section 3, we present our solution rst as a general conceptual model
and then as a practical implementation. Section 4 discusses the feasibility of
shell diversi cation and presents some results on the number of the script les in
two popular Linux distributions. The limitations of shell diversi cation are also
covered. Section 5 contains the related work and Section 6 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Attack scenario</title>
      <p>Our solution aims to prevent the attacks where the attackers succeed to run
malicious shell scripts or shell code fragments in the system. Code injection
attacks are one typical way for adversaries to achieve this. Code injection usually
happens against interfaces where the target system requests data from the user.
If the system doesn't properly handle this data, it may become susceptible to
code injection attack. Malicious user has an opportunity to o er the system data
containing code instructions that could get executed.</p>
      <p>For example, in C programming language, the function int system(const
char *command) from stdlib.h runs the given command string as a shell
command:
char command[100] = "ls -l ";
char *user_input;
/* ask a file name from the user here and put it in user_input */
strcat(command, user_input); /* add a file name to the input */
system(command); /* execute as a shell command*/</p>
      <p>Now, if the user would give the string "; cat /etc/passwd" as an input,
contents of the password le would be printed.</p>
      <p>
        As another example of a possible attack scenario, we discuss Shellshock [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a
family of security bugs found in the widely used Unix Bash shell, rst discovered
on 24 September 2014. While the vulnerabilities making this attack possible have
been patched, similar attacks are possible in future. Shellshock would have easily
been defeated by our approach. Bugs like Shellshock are very critical, because
many services on Internet, like several web servers, use Bash to process certain
requests.
      </p>
      <p>The Shellshock attacks made use of vulnerabilities in Bash, a program that
several Unix-based operating systems utilize to run command scripts. Bash is
often installed as the operating system's standard command-line interface.</p>
      <p>In Unix-based systems, every running program possesses a list of environment
variables, which are basically name-value pairs. When a running program invokes
another program, it gives an initial environment variable list to this new process.
In addition, Bash also internally stores a list of functions that can be run from
within the program. When Bash invokes itself as a child process, the original
instance can pass the environment variables and function de nitions on to the
new subshell. More speci cally, the function de nitions reside in the environment
variable list as encoded variables, the values of which start with parentheses
followed up by a function de nition. When the subshell starts, it changes these
values back into internal functions. The piece of code in the value is executed
and a function is created dynamically on the y.</p>
      <p>The problem is that the Bash version vulnerable to the attack does not
perform any check to make sure that the code fragment is a valid function
de nition. Attacker therefore has a chance to run Bash with a freely chosen
value in its environment variable list. This means the adversary can execute any
commands of his or her choice. Naturally, this arbitrary code execution would not
be possible in the situation where the interpreter only accepts scripts conforming
to the diversi ed script language.</p>
      <p>
        As an example, Shellshock can be used to take control of a server. The
following remote control attack attempts to use two programs { wget and curl {
to connect to the attacker's server and download a program that the attacker
can then use to control the targeted server [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]:
() { :;}; /bin/bash -c \"cd /tmp;wget http://213.x.x.x/ji;curl -O
/tmp/ji http://213.x.x.x/ji ; perl /tmp/ji;rm -rf /tmp/ji\"
      </p>
      <p>The downloaded Perl program is run immediately and remote access for the
attacker is established.</p>
      <p>Attacks like Shellshock can potentially compromise millions of servers and
other systems. However, if our implementation is in place, a successful attack
requires knowing the diversi ed shell commands, that is, the secret used to diversify
the original commands. Without this knowledge, many security vulnerabilities
become useless. It follows that our solution is also proactive in the sense that
it does not depend on the exact attack vector as long as the adversary tries to
use a shell language to perform the attack. In what follows, we will provide a
detailed description of our scheme.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Our solution</title>
      <sec id="sec-3-1">
        <title>The conceptual diversi cation scheme</title>
        <p>In our conceptual diversi cation scheme, a diversi er tool is used to produce
uniquely diversi ed script les. These scripts can then be run only by an
interpreter that supports diversi ed scripts. The interpreter executes the script
by making use of the secret that has been used to diversify the script le. As
the malicious adversaries do not possess the diversi cation secret, they cannot
diversify their malicious code fragments correctly and their attacks are thwarted.</p>
        <p>Our diversi cation scheme is shown in Figure 1. Each diversi ed script has
its own secret that is used to generate the diversi ed script le with a diversi er
tool and execute it with a diversi ed interpreter. The tokens in the scripts are
diversi ed by combining the semantic value (that is, the string presentation) of
the original token and a unique tag. In this context, "token" means a collection
of characters that is assigned a token identi er by the interpreter's lexer. The
tag is calculated using the secret and the semantic value of the token under
diversi cation (see the circles in Figure 1). Simple concatenation can be used
but it is also possible to use some cryptographic function to combine these parts.
For example, our implementation appends a hash value to the original token.</p>
        <p>
          Our method only diversi es the tokens that occur in the script, so the
adversary has no way of knowing the diversi ed forms of other tokens even if he
or she somehow get access to the script's source code. It is worth noting that
each token receives its own unique diversi cation. In this sense, we improve the
solution suggested by Portokalidis et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] where each token in a script le is
diversi ed by appending the same secret tag to each token in a script le. Our
scheme makes the diversi cation more secure by making the diversi cation of
di erent tokens independent of each other. Taking this approach a step further,
we can also vary the diversi cation of a token depending on the context it
appears in. For example, the diversi ed form of a token can depend on preceding
tokens or the location of the token in the script le. This makes it even harder
for an attacker to guess the diversi ed forms of the tokens and inject anything
into the script.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>The practical implementation</title>
        <p>Our proof-of-concept implementation of the diversi ed Bash shell was
implemented by extending and modifying GNU Bash version 4.3.39. The
implementation is written in C. The implementation and testing was performed using
Ubuntu GNU/Linux 3.16.0-45-generic on 32-bit architecture. Our
implementation itself is provided as an additional tool library, keeping direct modi cations
to the actual Bash interpreter minimal.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], Portokalidis et al. implemented a proof-of-concept version of a Perl
interpreter that executed Perl scripts with randomized instruction sets. The
interpreter's lexical analyzer was modi ed to append a 9-number tag to each
token recognized by the lexical analyzer. Our approach for diversifying Bash
follows a similar design: we append a diversifying tag after each recognizable
token's semantic value.
        </p>
        <p>In Bash, these tokens can be keywords like while, for, if or more complex
constructs like assignments such as k=1. As mentioned previously, the
diversifying tag depends on the semantic value of these tokens. In Bash, the semantic
value of the token if is "if" but for example k=1 is understood as token of the
type ASSIGNMENT WORD, k=1 being its semantic value. Hence, the string k=1
receives a di erent diversifying tag from k=2 despite both being of the same token
type.</p>
        <p>The diversi cation process is shown in Figure 2. The diversi cation library
provides an interface that the Bash interpreter uses during the tokenizing and
execution phases. Each hash value is separated from a token with a distinct
string of characters. This separator string is used to strip hashes from the input
stream before it is passed to the lexical analyzer. For example, with the separator
and a hash value, the echo command could become
echo~~~B2D21E771D9F86865C5EFF193663574DD1796C8F</p>
        <p>After the lexical analyzer has determined which token it is currently handling,
a hash is calculated for that token and compared with the collected hash. If the
hashes match, execution is allowed. Otherwise, the diversi ed token is considered
erroneous and execution of the script is halted.</p>
        <p>The current implementation uses two di erent separators for the hashes. The
rst separator is meant for language speci c reserved words and other tokens.
The second separator informs the diversi cation library that the word before
the separator should be a command word, that is, built-in utility function, a
function call, or a program or script in the PATH variable. Before the command
gets executed, it is parsed for a hash and it is compared to a hash calculated
from the command word. As with other tokens, if the comparison is successful,
execution is allowed, otherwise the script execution is halted.</p>
        <p>
          In our proof-of-concept implementation of our diversi ed Bash interpreter,
the hashes are generated using SHA-1. The hash is calculated by
concatenating the original token and a token-speci c secret. In [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] Portokalidis et al.
included the secret in the beginning of the diversi ed script le or provided it as
a command line argument to the interpreter. The secret was omitted from the
executable script before parsing. Our solution currently uses the same approach
but di erent methods of storing and handling the secret are quite easy to add.
        </p>
        <p>As an example of script diversi cation, consider the following script that
calculates a few rst digits of the Fibonacci sequence:
Num=5
f1=1
f2=1
echo "The Fibonacci sequence for the number $Num is : "
for (( i=0;i&lt;=Num;i++ ))
do
/bin/echo -n "$f1 "
fn=$((f1+f2))
f1=$f2
f2=$fn
done</p>
        <sec id="sec-3-2-1">
          <title>The diversi ed version of the script would look like the following:</title>
          <p>Num=5^^^9D4D7FB947AFB1BA187FAEFB20533E918EE04212
f1=1^^^D0EE7568D8FE56441EA4BA60CEB119526C12CA06
f2=1^^^BB8630463671DBC49124A08566D6211B5BB90A6B
echo~~~B2D21E771D9F86865C5EFF193663574DD1796C8F
"The Fibonacci sequence for the number $Num is : "
for^^^D9000A6E1DBA2A95B2DDB13E74B220354B5B63AC
(( i=0;i&lt;=Num;i++ ))^^^A04BDD7E8B4AB852FDC07FAF54E0107B12913976
do^^^23CF80A1D6201DAEA7112F6EA161DBA32A055BD2
/bin/echo~~~BCD981E6B112655886C12639214C366EF6961F03 -n "$f1 "
fn=$((f1+f2))^^^A52A61459E705054790329809CA21970B2999E77
f1=$f2^^^451DBA3B0289063BCA2F6B7319D9F37F944C1BA6
f2=$fn^^^7ECA3DF4236A6E384DE9ABABD46C4D53BEA2528A
done^^^14D13C75E6A9348DDD5561AD7F1155609175F38A</p>
          <p>The hashes in this example, generated using SHA-1 function, are rather long
and result into a considerable increase in source script le sizes. However, the
module responsible for generating and validating the hashes can be easily
extended to facilitate alternative methods of hash generation. For the sake of
clarity, the hashes are encoded in hexadecimals in the previous example script. The
test run performed on this diversi ed script and other similar examples executed
without errors.</p>
          <p>The purpose of our diversi cation library is to provide integrable diversi
cation functionality with minimal changes required to the original interpreter. This
would enable a multitude of Bash-like and other interpretable languages to be
diversi ed relatively easily and lessen the burden of maintaining vastly di erent
versions of diversifying script interpreters.</p>
          <p>Integrating the diversi cation library into the existing Bash interpreter
required fairly minimal changes to the Bash source code. Most changes were
required in parse.y, the input le for the Bison parser generator. As mentioned
before, the diversi cation module operates between Bash's I/O handlers and
lexical analyzer. As Bash analyses the source code, the diversi cation module collects
recognizable hashes for future comparison. When Bash's lexical analyzer
identi es a token, diversi cation module catches these tokens and calculates hashes
for them and compares them with the previously collected hashes. To make sure
that code does not get executed before the tokens have been veri ed, the le
execute cmd.c was modi ed to ask permission from the diversi cation module
to execute the parsed code.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Further notes on our approach</title>
        <p>A big bene t of our approach is that it does not change the software development
process. The programmer can write scripts as usual and the diversi cation of the
script is performed by an automatic tool after the code has been written. The
user experience is also not a ected because the semantics of the scripts remain
the same.</p>
        <p>Diversi cation resembles encryption, and one might wonder why we do not
encrypt the script les wholly in our scheme. There is a clear bene t in
diversifying script languages instead of simply encrypting those script les. When
executing an encrypted script, the le rst needs to be decrypted. Once this step
has been completed, the le is fed to the interpreter. Were an attacker to utilize
an attack vector that would bypass the decryption phase entirely, such as a code
injection attack, the system would remain vulnerable. In a code injection attack,
the malicious code is placed inside the running program or script. This would
circumvent the encryption-decryption process. Diversi cation prevents this
scenario by renaming the language interface. Even if malicious code is injected in
the running software, it will no longer match the language of the interpreter.</p>
        <p>Moreover, unlike with completely encrypted code, with diversi ed code it
is possible to use a renaming scheme in which the original command names are
part of the diversi ed names. This way the code remains easily readable and also
maintainable to some extent. The script could also be only partially diversi ed
so that some parts of the code remain open to manual or automatic changes. In
any case, it is worth noting that diversi cation and encryption can be used as
separate layers of protection.</p>
        <p>Security of our approach could also be futher improved with several methods.
For example, the original language interface of the Bash command shell could be
left in the system as a honeypot that catches malicious programs trying to use
it. This is possible because no trusted program should use this original interface
anymore. Other way to increase the resilience of our scheme is to make the
diversi cation change dynamically over time. This way, the adversary will have
much less time to gure out the diversi cation that keeps varying.</p>
        <p>We also performed preliminary performance tests on our diversi ed
interpreter. The test le consists of 5000 lines of randomized assignment operations.
This le was then diversi ed in order to run experiments with our
implementation. While the code example itself is nave, it requires the diversi ed interpreter
to undiversify each command. This represents the worst case performance
scenario for our implementation. Many more complex command structures, such as
loops, could be undiversi ed just once, even though their code is executed several
times. The Both les were executed 100 times for both original and diversi ed
Bash interpreter. The times were measured using Bash's built-in time-command.
The standard Bash interpreter performed each execution at an average of 0.0164
seconds, while the diversi ed Bash performed at 0.0443 seconds. Hence, our
diversi ed interpreter takes around 2.7 times longer to execute. Because we have
not yet fully optimized our diversi ed interpreter and because the experiment
was run using the worst case scenario, we do not consider this a large
performance penalty.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Feasibility of shell diversi cation</title>
      <p>In this section, we present a study of presence of script les in two Linux
distributions and discuss some limitations of our diversi cation scheme.
4.1</p>
      <sec id="sec-4-1">
        <title>A study of presence of script les in two Linux distributions</title>
        <p>Our data was collected on Fedora 22 Server distribution and an older,
minimal Gentoo distribution. More speci cally, on Fedora, the command uname -a
yields Linux 4.0.4-301.fc22.x86 64 #1 SMP Thu May 13:10:33 UTC 2015 x86 64
GNU/Linux. Respectively, Gentoo's uname -a is Linux Gentoo 3.14.4 #1 Tue
May 20 11:04:51 EEST 2014 x86 64 GNU/Linux. During Fedora's installation
process a few extra selections were made. The installation type Web Server was
chosen and add-ons Tomcat, PHP and MariaDB were added to the installation
to provide a touch of real-life server environment.</p>
        <p>The process of cataloguing script les was performed using simple tools
provided with the installation. First, qualifying les were aggregated using the find
command and then ltered using a simple grep command. All commands were
run with root privileges and only script les with execute permissions were
searched for. A guard le was created in order to avoid les that are being
actively updated. Finally, the sed command was used to remove a few pure
binary les from the results:
# touch guard
# find / -type f -perm /a+x ! -newer guard &gt; files
# xargs grep '^#![/a-z]*/bin/[a-z0-9]*'</p>
        <p>&lt; files &gt; grepmatches 2&gt; /dev/null
# sed -i '/Binary/d' grepmatches</p>
        <p>The results of this process were processed with a simple Python script.
Interpreter paths of the form #!/usr/bin/env X were shortened to either #!/usr/bin/X
or #!/bin/X where appropriate. The rst column of Tables 1 and 2 shows the
interpreter referenced by the script on the shebang (#!) line and the second
column has the number of such references. Due to the grepping procedure, some
les were listed twice. Those les were eliminated in post-processing.</p>
        <p>In addition to executable scripts, we also aggregated non-executable library
scripts by rst listing all les in the system using
# find / -type f</p>
        <sec id="sec-4-1-1">
          <title>These les were then ltered using the command</title>
          <p># grep grep '\.py[co]*$\|\.sh$\|\.p[lm]$' files &gt; libraryfiles</p>
          <p>The ltering process relies upon le extensions used by library developers. In
Unix-based systems there is no guarantee that le names contain a le extension.
However, most well maintained libraries adhere to the convention of using le
extensions and thus the numbers give an accurate enough estimation on the
quantity of scripts in a fresh system installation.</p>
          <p>The data we collected on executable and non-executable scripts were
combined using a simple Python script. This script ensured that every le would
be calculated only once (having a le extension and a shebang would qualify
the le for both executable and non-excutable categories). Some libraries have
multiple versions of the same le, for example, python library might include
script file.py, script file.pyo and script file.pyc where the rst le is
the source le and the two latter les are byte-code les. In this case script file
would only be added to the sum once.</p>
          <p>The script les { both executable and non-executable { found in Fedora and
Gentoo are shown in Table 1 and Table 2, respectively. Comparing these two
tables, we see Fedora has 2319 script les more, but it is also a bit more
serviceoriented distribution. The biggest di erence seems to be in the number of library
scripts, Gentoo has more Perl, Python and shell libraries than Fedora. Other
than that, the number of scripts is quite similar. The installations themselves
are also fairly similar in size (about 80 MiB).</p>
          <p>What can be deduced from this data, then? There are quite many script
les in both distributions we studied. Still, diversifying them all would not be a
huge work for an automated diversi er. It is also worth noting that most of the
scripts are Bash or sh scripts that can be handled (sh is a subset of Bash and
diversi ed sh scripts can therefore be run using our implementation). Perl and
Python scripts also seem to make up a signi cant proportion of all the scripts
in the system, so covering the interpreters of these script languages would be
important for a comprehensive script diversi cation system. Also, some of the
scripts can be rewritten to use a di erent interpreter to achieve a completely
diversi ed solution.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Limitations of shell diversi cation</title>
        <p>Although our diversi cation scheme provides many advantages from the security
point of view, it also has some limitations and drawbacks. Obviously, diversifying
all scripts in the system introduces a problem for users who want to use the
command shell manually. After all, it would be too laborious for the users to
write diversi ed keywords and scripts. We could provide users with a separate
terminal for inputting shell commands, but this solution can be a security risk
as the malware may nd a way to use this interface as well. On the other hand,
many normal users are not able or do not need to use the command shell. In
some remote systems, the need for an interactive local shell could be replaced
by remote administration tools.</p>
        <p>Another challenge is the problem of diversifying all the scripts and programs
that may dynamically create new scripts at runtime. Still, an automatic
diversi er program that programmers can use when adding scripts to their programs
can be created for this purpose. This way, the programmer does not have to
manually diversify any scripts that might be included in his or her program code.
Also, minimal systems { for instance the operating systems for IoT devices {
contain much smaller amounts of script les and scripts included in program
code and are thus easier to handle with regard to our approach.</p>
        <p>Installing new programs and scripts into the system can also introduce some
problems. In order to work correctly, new programs and scripts also need to
be diversi ed using a diversi cator tool. In some systems { such as small-scale
systems on IoT devices { the issue could be mitigated by preferring image based
full-system updates over in-place updates of system les run by scripts.
Therefore, at least for minimal and restricted IoT environments, our solution can be
expected to work well. In the IoT context, the size of the full system may only
be a few megabytes, which makes it quite easy to apply the diversi cation and
x possible issues.</p>
        <p>In instruction set diversi cation schemes in general, storing the secret
diversi cation key or keys securely is also an issue. In our scheme, however, we assume
the attacker does not have an access to the le system of the computer he or
she is targeting; this is the case in Shellshock and similar attacks in which the
adversary is trying to gain an access to the system. Therefore, for the purposes
of this attack scenario the le system can be seen as a safe place to store the
secret key. Of course, some stronger cryptographical storing schemes could also
be considered.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related work</title>
      <p>
        Instruction set randomization has been applied to several di erent software
layers and many di erent application areas. Portokalidis et. al present the model
closest to our work in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The authors apply instruction set randomization to
a Perl interpreter, enabling execution of diversi ed scripts. Perl code injected
by an adversary will fail to run because it is not correctly diversi ed and is not
recognized by the system. They also brie y mention the idea of diversifying shell
scripts but do not provide any details or implementation. In this sense, our work
can be seen as continuation for their paper.
      </p>
      <p>
        Boyd and Keromytis have studied the SQL language [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Their intermediate
proxy, SQLrand, translates the diversi ed queries into the original SQL language
and passes them on to the database. We present an improved scheme and
implementation for SQL randomization in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Many papers [
        <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
        ] and books [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]
also study the idea of system-wide, global instruction set diversi cation. Building
diverse operating systems and software systems in general has been suggested by
Cohen already in the nineties [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Forrest [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] also discusses diversi ed software
systems as a security measure.
      </p>
      <p>
        Barrantes et al. have studied instruction set randomization on binary level
to defend against code injection attacks [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. There has also been some interest
in randomizing the system call numbers to render malicious code useless: Jiang
et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Liang et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] have studied this issue. We have also presented
a tool for system call randomization [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Using similar ideas, randomization of
memory addresses has been used to prevent memory exploits [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The common
factor for all of these approaches is that the basic idea is to change the language
of the system in order to prevent malicious programs from using some kind of
interface that provides access to a resource.
      </p>
      <p>
        Identi er obfuscation scrambles identi er names on source code level. This
is conceptually somewhat similar to our diversi cation scheme and has been
discussed in many papers. For example, there are diversifying tools for Java [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
and JavaScript [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We have studied this topic and built a tool that scrambles
identi ers and function signatures in web applications written in JavaScript and
HTML [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We presented an instruction set randomization based scheme for preventing code
injection attacks in Linux shells. By diversifying the tokens of the Bash scripts
uniquely, we prevent the attacker from possessing the knowledge about the
correct script language beforehand. We have also discussed the practical
implementation for our scheme and explained the e ectiveness of our scheme against
Linux shell code injection attacks such as Shellshock. We also discussed how our
solution improves security over a previous diversi cation approach.</p>
      <p>A study of the presence of script les in two popular Linux distributions was
also presented. Based on this, it seems that Perl and Python interpreters should
also be covered in a practical and comprehensive script diversi cation scheme.
Therefore, possible future work includes developing our diversi cator library to
more general direction in order to handle other script languages like Perl and
Python.</p>
      <p>One limitation of our approach is that all scripts in a system need to be
diversi ed. However, this is quite possible at least in many restricted server
environments and small IoT environments with a limited number of scripts and
infrequent updates. Also, the diversi cation could be performed automatically
for the most part.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.G.</given-names>
            <surname>Barrantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.H.</given-names>
            <surname>Ackley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Forrest</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Stefanovic</surname>
          </string-name>
          .
          <article-title>Randomized Instruction Set Emulation</article-title>
          .
          <source>ACM Trans. Inf. Syst. Secur.</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>40</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E.G.</given-names>
            <surname>Barrantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.H.</given-names>
            <surname>Ackley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.S.</given-names>
            <surname>Palmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stefanovic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.D.</given-names>
            <surname>Zovi</surname>
          </string-name>
          .
          <article-title>Randomized Instruction Set Emulation to Disrupt Binary Code Injection Attacks</article-title>
          .
          <source>In Proceedings of the 10th ACM Conference on Computer and Communications Security, CCS '03</source>
          , pages
          <fpage>281</fpage>
          {
          <fpage>289</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.W.</given-names>
            <surname>Boyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.S.</given-names>
            <surname>Kc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.E.</given-names>
            <surname>Locasto</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Keromytis</surname>
          </string-name>
          .
          <article-title>On the General Applicability of Instruction-Set Randomization</article-title>
          .
          <source>IEEE Transactions on Dependable and Secure Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.W.</given-names>
            <surname>Boyd</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Keromytis. SQLrand: Preventing SQL Injection</surname>
          </string-name>
          <article-title>Attacks</article-title>
          .
          <source>In Applied Cryptography and Network Security, Lecture Notes in Computer Science</source>
          Volume
          <volume>3089</volume>
          , pages
          <fpage>292</fpage>
          {
          <fpage>302</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. CloudFlare. Inside Shellshock:
          <article-title>How hackers are using it to exploit systems</article-title>
          . Available at: https://blog.cloud are.com/inside-shellshock/,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>F.B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <source>Operating System Protection through Program Evolution. Comput</source>
          . Secur.,
          <volume>12</volume>
          (
          <issue>6</issue>
          ):
          <volume>565</volume>
          {
          <fpage>584</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>D.C. DuVarney</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          <string-name>
            <surname>Venkatakrishnan</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bhatkar</surname>
          </string-name>
          .
          <article-title>SELF: A Transparent Security Extension for ELF Binaries</article-title>
          .
          <source>In Proceedings of New Security Paradigms Workshop</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>S.</given-names>
            <surname>Forrest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somayaji</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ackley</surname>
          </string-name>
          .
          <article-title>Building Diverse Computer Systems</article-title>
          .
          <source>In Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)</source>
          ,
          <source>HOTOS '97</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Jajodia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.S.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Swarup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.S.</given-names>
            <surname>Wang. Moving Target Defense</surname>
          </string-name>
          <string-name>
            <surname>II</surname>
          </string-name>
          ,
          <source>Advances in Information Security 100</source>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Jajodia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Swarup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.S.</given-names>
            <surname>Wang. Moving Target</surname>
          </string-name>
          <string-name>
            <surname>Defense</surname>
          </string-name>
          ,
          <article-title>Creating Asymmetric Uncertainty for Cyber Threats</article-title>
          ,
          <source>Advances in Information Security 54</source>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Jiancheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhongying</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Yuan</surname>
          </string-name>
          .
          <article-title>Polymorphic Algorithm of JavaScript Code Protection</article-title>
          .
          <source>In Proceedings of International Symposium on Computer Science and Computational Technology, ISCSCT '08</source>
          , pages
          <fpage>451</fpage>
          {
          <fpage>454</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y-M. Wang. RandSys:</surname>
          </string-name>
          <article-title>Thwarting Code Injection Attacks with System Service Interface Randomization</article-title>
          .
          <source>In IEEE International Symposium on Reliable Distributed Systems, SRDS 2007</source>
          , pages
          <fpage>209</fpage>
          {
          <fpage>218</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>G.S.</given-names>
            <surname>Kc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Keromytis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Prevelakis</surname>
          </string-name>
          .
          <article-title>Countering Code-injection Attacks with Instruction-set Randomization</article-title>
          .
          <source>In Proceedings of the 10th ACM Conference on Computer and Communications Security, CCS '03</source>
          , pages
          <fpage>272</fpage>
          {
          <fpage>280</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>A System Call Randomization Based Method for Countering Code Injection Attacks</article-title>
          .
          <source>In International Conference on Networks Security, Wireless Communications and Trusted Computing</source>
          ,
          <string-name>
            <surname>NSWCTC</surname>
          </string-name>
          <year>2009</year>
          , pages
          <fpage>584</fpage>
          {
          <fpage>587</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>G.</given-names>
            <surname>Portokalidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Keromytis. Global</surname>
          </string-name>
          <string-name>
            <surname>ISR</surname>
          </string-name>
          :
          <article-title>Toward a Comprehensive Defense Against Unauthorized Code Execution</article-title>
          . In Moving Target Defense,
          <article-title>Creating Asymmetric Uncertainty for Cyber Threats</article-title>
          ,
          <source>Advances in Information Security 54</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rauti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lauren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hosseinzadeh</surname>
          </string-name>
          , J. Makela,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hyrynsalmi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Leppa</surname>
          </string-name>
          <article-title>nen. Diversi cation of System Calls in Linux Binaries</article-title>
          . In To be published in
          <source>proceedings of the 6th International Conference on Trustworthy Systems (InTrust</source>
          <year>2014</year>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rauti</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Leppa</surname>
          </string-name>
          <article-title>nen. A Proxy-Like Obfuscator for Web Application Protection</article-title>
          .
          <source>International Journal on Information Technologies &amp; Security</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rauti</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lepp</surname>
          </string-name>
          <article-title>anen. Man-in-the-Browser Attacks in Modern Web Browsers</article-title>
          .
          <source>In Emerging Trends in ICT Security</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rauti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teuhola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Leppa</surname>
          </string-name>
          <article-title>nen. Diversifying SQL to Prevent Injection Attacks</article-title>
          . To be published in
          <source>proceedings of International Conference on Trust, Security and Privacy in Computing and Communications</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhangyong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiaojiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dingyi</surname>
          </string-name>
          , and
          <string-name>
            <surname>C. Feng.</surname>
          </string-name>
          <article-title>Research on Java Software Protection with the Obfuscation in Identi er Renaming</article-title>
          .
          <source>In Fourth International Conference on Innovative Computing, Information and Control (ICICIC)</source>
          , pages
          <fpage>1067</fpage>
          {
          <fpage>1071</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>