Polymorphing Software by Randomizing Data Structure Layout, Hacking and IT E-Book Dump Release

[ Pobierz całość w formacie PDF ]
PolymorphingSoftwarebyRandomizingData
StructureLayout
Zhiqiang Lin, Ryan D. Riley, and Dongyan Xu
CERIAS and Department of Computer Science, Purdue University, USA
fzlin,rileyrd,dxug@cs.purdue.edu
Abstract. This paper introduces a new software polymorphism tech-
nique that randomizes program data structure layout. This technique
will generate dierent data structure layouts for a program and thus di-
versify the binary code compiled from the same program source code.
This technique can mitigate attacks (e.g., kernel rootkit attacks) that
require knowledge about data structure denitions. It is also able to dis-
rupt the generation of data structure-based program signatures. We have
implemented our data structure layout randomization technique in the
open source compiler collection
gcc-4.2.4
and applied it to a number
of programs. Our evaluation results show that our technique is able to
achieve software binary diversity. We also apply the technique to one op-
erating system data structure in order to foil a number of kernel rootkit
attacks. Meanwhile, programs produced by the technique were analyzed
by a state-of-the-art data structure inference system and it was demon-
strated that reliance on data structure signatures alone may lead to false
negatives in malware detection.
1Introduction
A widely adopted methodology for implementing software is data abstraction,
which involves the abstraction of data structures and enables programmers to
isolate a data denition from its representation and operations. Software is im-
plemented to access and process data structures. Software implementation, if
not obfuscated, will expose certain data structure denitions as well as their lay-
outs. This observation has been exploited recently in network protocol reverse
engineering [11, 16, 20, 29, 30, 40].
Knowledge about data structure layout is often used by attackers. For ex-
ample, a buer overow attack relies on the attacker knowing that the program
buer is adjacent to a function pointer or return address [22]. Kernel rootkits,
especially those that manipulate kernel objects directly, require that the attacker
know the layout of specic kernel objects in order to manipulate them. In net-
work application penetration testing, if the attacker knows the structure of the
protocol message, he can reduce the fuzz space and speed up the test [21, 39].
These attacks can be foiled if we can prevent attackers from obtaining an accu-
rate data structure layout of the victim program.
Data struct layouts are also used as attack signatures in some defense tech-
niques. For example, in protocol analysis, the data structure associated with
2
a protocol payload can be used to construct the exploit signature for runtime
network intrusion detection. In malware analysis, it has been reported recently
that data structure layout can be used to generate malware signatures [19].
Forrest et al. [23] has suggested that monoculture is one of the main reasons
why computers are vulnerable to large-scale, reproductive attacks. As such, ran-
domization can be introduced to increase the diversity of software. This strategy
has been widely instantiated in existing work such as address space randomiza-
tion (ASR) [8, 10, 38, 41], instruction set randomization (ISR) [6, 28], data ran-
domization [12,17], and operating system interfaces randomization [13,27]. Given
the success of existing randomization strategies, we propose another instantia-
tion of software randomization: Data structure layout randomization (DSLR).
In this paper, we demonstrate that software can be diversied by DSLR. We
propose an approach to instrument a compiler (as the compiler knows about a
program's semantics) so that it will generate a dierent data structure layout
each time the same source program is compiled. We instrument the compiler
to scan the data structure denitions (e.g.,
struct
and
class
) marked by the
programmer as randomizable and then reorder their member elds and insert
garbage elds. We note that DSLR is dierent from the software obfuscation
techniques [15]. Those techniques are used in software protection and aim at
making it harder to reverse engineer the data structure denitions in asingle
binary. On the other hand, DSLR makes it dicult to derive data structure
signatures frommultiplecopies of the same software.
The benet of DSLR to malware defense is two-fold: First, DSLR can miti-
gate attacks that rely on knowing the data structure layout of victim programs.
Second, the feasibility (and simplicity) of DSLR suggests that malware signa-
tures based on data structure layout may not always be eective when used
alone for malware detection.
We have implemented our DSLR technique in an open source compiler col-
lection,
gcc-4.2.4
, and applied it to a number of programs. The detailed design
and implementation are presented in Section 3 and Section 4, respectively. Our
evaluation results in Section 5 show that DSLR can achieve software binary di-
versity. DSLR can be used generate diverse kernel data structure denitions to
mitigate a number of kernel rootkit attacks. Meanwhile, we demonstrate that
DSLR introduces noise to a state-of-the-art data structure inference system when
generating a program's data structure signature. Finally, DSLR imposes very low
performance overhead on
gcc
and on the original, un-randomized program.
2TechnicalChallenges
In this section, we examine two technical challenges in realizing DSLR: Which
data structures to randomize and how to randomize them.
2.1 Randomizability of Data Structures
Data structure layout, at the binary level, is reected by the osets of the encap-
sulated object elds. The encapsulated objects include
struct
,
class
, and stack
3
variables declared in functions (as they are related to a particular stack frame
and addressed by
EBP
). The rst two types have been exploited to derive mal-
ware signatures [19]. We believe that a function's local variable layout can also
be leveraged to compose signatures and thus we will also discuss randomizing
them.
However, randomizing just any data structure will not work in general as
manifested in the following examples: (1) If a data structure is used in network
communication, the communicating parties may not understand each other if
the data structure is randomized. (2) If a data structure denition is public
(e.g., dened in shared library
stdio.h
), it cannot be randomized. (3) There is
a special case in GNU C that allows zero-length arrays to be the last element
of a structure (a zero-length array is actually the header of a variable-length
object). If a zero-length array is declared as the last element in a
struct
, that
element cannot be randomized, otherwise it cannot pass
gcc
syntax checking.
(4) A programmer may directly use the data oset to access some elds. (This is
particularly true in programs which mix assembly and C code.) (5) To initialize
the value of a structure, the programmer uses the order declared to initialize
the structure. These elds cannot be randomized, as the program may crash. In
light of these cases, we declare a data structure as randomizable if and only if it
is not exposed to any other external programs and does not violate the original
gcc
syntax and programmer intention.
Data structure randomizability is closely related to program semantics. It
would be ideal if the compiler could automatically spot all the randomizable data
structures. In practice, however, only the programmer can designate randomiz-
able data structures with condence. Even if we could dene some heuristics to
automatically spot those randomizable data structures, we could not claim both
completeness and safety. In this paper, we simply require that programmers use
new keywords to specify randomizable data structures.
2.2
Data Structure Randomization Methods
The second challenge is how to randomize a data structure. The simplest ran-
domization method would be to reorder its layout. Our primary goal is to create
binary diversity for the same software { the more variation, the better. There-
fore, we will design a randomization method which reorders the member elds
of each data structure to be randomized. Suppose a program has n such data
structures and each has m elds, then the number of possible combinations after
randomization would be (m!)
n
.
However, eld reordering alone is still not sucient. For example, suppose
a data structure has only two members which are both of
int
type. No matter
how we reorder these two elds, the layout of this data structure is still \
int
and
int
". As a result, to randomize a data structure containing multiple members
of the same type, we have to use a dierent randomization method. To this end,
we insert garbage elds into these data structures.
4
3DSLRDesigninGCC
In this section we present the detailed design of DSLR in a specic compiler
system. As C/C++ is commonly used in system and user level programming,
we have implemented our DSLR technique in the popular, open-source compiler
gcc
[1].
By instrumenting
gcc
to reorganize the elds in encapsulated data structures,
DSLR will ll the memory image with a random layout each time the program
source is compiled. Hence, we need to decide where to instrument
gcc
.
For a program source,
gcc
rst builds an initial Abstract Syntax Tree (AST).
It then converts the language-specic AST into a uniform, generic AST. The
generic AST will be transformed into another representation called GIMPLE
(a representation form which has at most three operands). After GIMPLE, the
source code is converted into the static single assignment (SSA) representation
[5] to facilitate more than 20 dierent optimizations on SSA trees. After the
SSA optimization pass, the tree is converted back to GIMPLE which is then
used to generate a register-transfer language (RTL) tree. RTL is a hardware-
based representation that corresponds to an abstract target architecture with
an innite number of registers. There are also a number of optimization passes
such as register allocation, code scheduling, and peepholes performed at the RTL
level.
Given these internal steps in
gcc
, the possible instrumentation points for
DSLR are AST, GIMPLE, SSA, and RTL. We instrumented at the AST level
for the following reasons: (1) the AST retains a lot of original information from
the program source code, such as the type and scope information for data struc-
tures and functions; (2) The AST representation is easier to understand and
the structure of the tree is concise and relatively convenient for us to modify;
(3) When generating the AST,
gcc
has not yet determined the layout of the
data structures, and as such we can reorder the data structure members and
reconstruct the AST without needing to compute specic memory addresses.
The data structures to be randomized can be divided into three categories:
struct
,
class
and the function stack variables. We reorder the inner AST rep-
resentations of these data structures, which will eventually lead to the reorga-
nization of the memory layout. Note that these data structures have their own
scopes. When the AST for these data structures is generated, all the member
variables in each data structure are chained together and represented by a link
list. To perform randomization, we can just capture the head node of the list,
reorder the nodes of the list based on a random seed, and insert some \garbage"
nodes into the list if necessary.
Figure 1 shows a simple example. A data structure
test
has three elds:
int
a
,
charb
, and
int*c
. When compiled with the original
gcc
, the order of the
elds is in the originally declared order (Figure 1(b).) When compiled with our
DSLR-enabled
gcc
, the order of the elds is randomized. We also add 2 garbage
elds. Figure 1(c) shows the randomized AST representation of
structtest
.
As discussed in Section 2, to enhance data structure layout diversity we adopt
the following strategy: (1) dierent data structures at the same project build-
5
Original
Randomized
TEST
TEST
STRUCT TEST
{
INT A;
CHAR B;
INT *C;
};
B
C
A
C
G
G
B
A
(A)
(B)
(C)
Fig. 1: Example of data structure randomization: (a) the original denition, (b) the original
AST, and (c) the randomized AST. The \G" nodes represent the garbage elds added to
the data structure. The dotted arrows represent the order of the elds.
ing time will be reordered dierently (with dierent randomization seeds); and
(2) the same data structure at dierent project building times will be reordered
dierently. We use project building time instead of compile time because when
building a project,
gcc
usually compiles each le individually (as specied in the
Makele), and we need to ensure that the same data structure has a uniform
layout across one entire build. Suppose a program has two data structures,
S1
and
S2
, which have 4 and 5 elds respectively. When we build the program us-
ing our modied
gcc
,
S1
and
S2
will be randomized dierently. In addition, the
same data structure (e.g.,
S1
) will have dierent layouts in memory at dierent
project building times. Hence, the number of possible layouts for this program
would be 4! 5!. We believe such a strategy will greatly improve the binary di-
versity of the program, as the chances of generating identical instances would be
1=(
Q
i=1
jS
i
j!), where j is the total number of data structures to be randomized
and jS
i
j represents the total number of elds (members) in data structure S
i
.
4DSLRImplementationinGCC
Our DSLR prototype is implemented in
gcc-4.2.4
with over one thousand lines
of C code. We modied
gcc
's AST representation to perform the randomization.
Our prototype consists of four key components: (1) keyword recognizer, which
recognizes the new keywords we introduce to specify data structure randomiz-
ability and garbage padding; (2) re-orderer, which reorders the eld variables
in a data structure denition according to a random seed; (3) padder, which
inserts the garbage elds into a data structure; and (4) randomization driver,
which controls the randomization process. In the remainder of this section, we
present the details of these components.
4.1
Keyword Recognizer
We introduce several new keywords to instruct
gcc
regarding which data struc-
tures to randomize and how.
[ Pobierz całość w formacie PDF ]

zanotowane.pl

doc.pisz.pl

pdf.pisz.pl

jaczytam.htw.pl

Polymorphing Software by ...

Polymorphing Software by Randomizing Data Structure Layout, Hacking and IT E-Book Dump Release

Archiwum