Polymorphic Worm Detection Using Structural Information of Executables, Hacking and IT E-Book Dump Release

[ Pobierz całość w formacie PDF ]
Polymorphic Worm Detection
Using Structural Information of Executables
Christopher Kruegel
1
,EnginKirda
1
,
Darren Mutz
2
, William Robertson
2
, and Giovanni Vigna
2
1
Technical University of Vienna
chris@auto.tuwien.ac.at
engin@infosys.tuwien.ac.at
2
Reliable Software Group
University of California, Santa Barbara
{
dhm,wkr,vigna
}
@cs.ucsb.edu
Abstract.
Network worms are malicious programs that spread auto-
matically across networks by exploiting vulnerabilities that affect a large
number of hosts. Because of the speed at which worms spread to large
computer populations, countermeasures based on human reaction time
are not feasible. Therefore, recent research has focused on devising new
techniques to detect and contain network worms without the need of
human supervision. In particular, a number of approaches have been
proposed to automatically derive signatures to detect network worms
by analyzing a number of worm-related network streams. Most of these
techniques, however, assume that the worm code does not change during
the infection process. Unfortunately, worms can be
polymorphic
.That
is, they can mutate as they spread across the network. To detect these
types of worms, it is necessary to devise new techniques that are able to
identify similarities between different mutations of a worm.
This paper presents a novel technique based on the structural analy-
sis of binary code that allows one to identify structural similarities be-
tween different worm mutations. The approach is based on the analysis
of a worm’s control flow graph and introduces an original graph coloring
technique that supports a more precise characterization of the worm’s
structure. The technique has been used as a basis to implement a worm
detection system that is resilient to many of the mechanisms used to
evade approaches based on instruction sequences only.
Keywords
: Network worms, Polymorphic code, Structural analysis, In-
trusion detection.
1
Introduction
In recent years, Internet worms have proliferated because of hardware and soft-
ware mono-cultures, which make it possible to exploit a single vulnerability to
compromise a large number of hosts [25].
Most Internet worms follow a scan/compromise/replicate pattern of behavior,
where a worm instance first identifies possible victims, then exploits one or more
vulnerabilities to compromise a host, and finally replicates there. These actions
are performed through network connections and, therefore, network intrusion
detection systems (NIDSs) have been proposed by the security community as
mechanisms for detecting and responding to worm activity [16, 18].
However, as worms became more sophisticated and ecient in spreading
across networks, it became clear that countermeasures based on human reac-
tion time were not feasible [23]. In response, the research community focused
on devising a number of techniques to automatically detect and contain worm
outbreaks.
In particular, the need for the timely generation of worm detection signa-
tures motivated the development of systems that analyze the contents of net-
work streams to automatically derive worm signatures. These systems, such as
Earlybird [19] and Autograph [6], implement a
content sifting
approach, which
is based on two observations. The first observation is that some portion of the
binary representation of a worm is invariant; the second one is that the spreading
dynamics of a worm is different from the behavior of a benign Internet appli-
cation. That is, these worm detection systems rely on the fact that it is rare
to observe the same byte string recurring within network streams exchanged
between many sources and many destinations. The experimental evaluation of
these systems showed that these assumptions hold for existing Internet worms.
A limitation of the systems based on content sifting is the fact that strings
of a significant length that belong to different network streams are required
to match (for example, byte strings with a length of 40 bytes are used in [19]).
Unfortunately, the next generation of Internet worms is likely to be
polymorphic
.
Polymorphic worms are able to change their binary representation as part of the
spreading process. This can be achieved by using self-encryption mechanisms
or semantics-preserving code manipulation techniques. As a consequence, copies
of a polymorphic worm might no longer share a common invariant substring of
sucient length and the existing systems will not recognize the network streams
containing the worm copies as the manifestation of a worm outbreak.
Although polymorphic worms have not yet appeared in the wild, toolkits to
support code polymorphism are readily available [5, 11] and polymorphic worms
have been developed for research purposes [7]. Hence, the technological barriers
to developing these types of Internet worms are extremely low and it is only a
matter of time before polymorphic worms appear in the wild.
To detect this threat, novel techniques are needed that are able to identify
different variations of the same polymorphic worm [15]. This paper presents a
technique that uses the structural properties of a worm’s executable to iden-
tify different mutations of the same worm. The technique is resilient to code
modifications that make existing approaches based on content sifting ineffective.
The contributions of this paper are as follows:

We describe a novel fingerprinting technique based on control flow informa-
tion that allows us to detect structural similarities between variations of a
polymorphic worm.

We introduce an improvement of the fingerprinting technique that is based
on a novel coloring scheme of the control flow graph.

We present an evaluation of a prototype system to detect polymorphic worms
that implements our novel fingerprinting techniques.
This paper is structured as follows. Section 2 discusses related work. Section 3
presents the design goals and assumptions of our fingerprinting technique and
provides a high-level overview of the approach. In Section 4, we describe how
the structure of executables is extracted and represented as control flow graphs.
In Section 5, we discuss how fingerprints are generated from control flow graphs,
and we present an improvement of our scheme that is based on graph coloring. In
Section 6, a summary of the actual worm detection approach is given. Section 7
evaluates our techniques, and in Section 8, we point out limitations of the current
prototype. Finally, Section 9 briefly concludes.
2 Related Work
Worms are a common phenomenon in today’s Internet, and despite significant
research effort over the last years, no general and effective countermeasures have
been devised so far. One reason is the tremendous spreading speed of worms,
which leaves a very short reaction time to the defender [22, 23]. Another reason is
the distributed nature of the problem, which mandates that defense mechanisms
are deployed almost without gap on an Internet-wide scale [14].
Research on countermeasures against worms has focused on both the detec-
tion and the containment of worms. A number of approaches have been proposed
that aim to detect worms based on network tra
c anomalies. One key obser-
vation was that scanning worms, which attempt to locate potential victims by
sending probing packets to random targets, exhibit a behavior that is quite differ-
ent from most legitimate applications. Most prominently, this behavior manifests
itself as a large number of (often failed) connection attempts [24, 26].
Other detection techniques based on trac anomalies check for a large num-
ber of connections without previous DNS requests [27] or a large number of
received “ICMP unreachable messages” [3]. In addition, there are techniques to
identify worms by monitoring trac sent to dark spaces, which are unused IP
address ranges [2], or honeypots [4].
Once malicious trac flows are identified, a worm has to be contained to
prevent further spreading [14]. One technique is based on rate limits for outgo-
ing connections [28]. The idea is that the spread of a worm can be stalled when
each host is only allowed to connect to a few new destinations each minute. An-
other approach is the use of signature-based network intrusion detection systems
(such as Snort [18]) that block trac that contains known worm signatures. Un-
fortunately, the spreading speed of worms makes it very challenging to load the
appropriate signature in a timely manner. To address this problem, techniques
have been proposed to automatically extract signatures from network trac.
The first system to automatically extract signatures from network trac was
Honeycomb [8], which looks for common substrings in tra
c sent to a honeypot.
Earlybird [19] and Autograph [6] extend Honeycomb and remove the assumption
that all analyzed tra
c is malicious. Instead, these systems can identify
recurring
byte strings
in general network flows. Our work on polymorphic worm detection
is based on these systems. To address the problem of polymorphic worms, which
encode themselves differently each time a copy is sent over the network, we
propose a novel fingerprinting technique that replaces the string matching with
a technique that compares the structural aspects of binary code. This makes the
fingerprinting more robust to modifications introduced by polymorphic code and
allows us to identify similarities in network flows.
Newsome et al. [15] were the first to point out the problem of string fin-
gerprints in the case of polymorphic worms. Their solution, called Polygraph,
proposes capturing multiple invariant byte strings common to all observations
of a simulated polymorphic worm. The authors show that certain contiguous
byte strings, such as protocol framing strings and the high order bytes of buffer
overflow return addresses, usually remain constant across all instances of a poly-
morphic worm and can therefore be used to generate a worm signature. Our
system shares a common goal with Polygraph in that both approaches identify
polymorphic worms in network flows. However, we use a different and com-
plementary approach to reach this goal. While Polygraph focuses on multiple
invariant byte strings required for a successful exploit, we analyze structural
similarities between polymorphic variations of malicious code. This allows our
system to detect polymorphic worms that do not contain invariant strings at all.
Of course, it is also possible that Polygraph detects worms that our approach
misses.
3 Fingerprinting Worms
In this paper, our premise is that at least some parts of a worm contain exe-
cutable machine code. While it is possible that certain regions of the code are
encrypted, others have to be directly executable by the processor of the victim
machine (e.g., there will be a decryption routine to decrypt the rest of the worm).
Our assumption is justified by the fact that most contemporary worms contain
executable regions. For example, in the 2004 “Top 10” list of worms published by
anti-virus vendors [21], all entries contain executable code. Note, however, that
worms that do not use executable code (e.g., worms written in non-compiled
scripting languages) will not be detected by our system.
Based on our assumption, we analyze network flows for the presence of ex-
ecutable code. If a network flow contains no executable code, we discard it im-
mediately. Otherwise, we derive a set of fingerprints for the executable regions.
Section 4 provides details on how we identify executable regions and describes
the mechanisms we employ to distinguish between likely code and sequences of
random data.
When an interesting region with executable code is identified inside a network
flow, we generate fingerprints for this region. Our fingerprints are related to the
byte strings that are extracted from a network stream by the content sifting
approach. To detect polymorphic code, however, we generate fingerprints at
a higher level of abstraction that cannot be evaded by simple modifications
to the malicious code. In particular, we desire the following properties for our
fingerprinting technique:
– Uniqueness.
Different executable regions should map to different finger-
prints. If
identical
fingerprints are derived for
unrelated
executables, the sys-
tem cannot distinguish between flows that should be correlated (e.g., because
they contain variations of the same worm) and those that should not. If the
uniqueness property is not fulfilled, the system is prone to producing false
positives.
– Robustness to insertion and deletion.
When code is added to an exe-
cutable region, either by prepending it, appending it, or interleaving it with
the original executable (i.e.,
insertion
), the fingerprints for the original exe-
cutable region should not change. Furthermore, when parts of a region are
removed (i.e.,
deletion
), the remaining fragment should still be identified as
part of the original executable. Robustness against insertion and deletion is
necessary to counter straightforward evasion attempts in which an attacker
inserts code before or after the actual malicious code fragment.
– Robustness to modification.
The fingerprinting mechanism has to be ro-
bust against certain code modifications. That is, even when a code sequence
is modified by operations such as junk insertion, register renaming, code
transposition, or instruction substitution, the resulting fingerprint should
remain the same. This property is necessary to identify different variations
of a single polymorphic worm.
The byte strings generated by the content sifting approach fulfill the unique-
ness property, are robust to appending and prepending of padding, and are
robust to removal, provided that the result of the deletion operation is at least
as long as the analyzed strings. The approach, however, is very sensitive to mod-
ifications of the code; even minimal changes can break the byte strings and allow
the attacker to evade detection.
Our key observation is that the internal structure of an executable is more
characteristic than its representation as a stream of bytes. That is, a represen-
tation that takes into account control flow decision points and the sequence in
which particular parts of the code are invoked can better capture the nature of
an executable and its functionality. Thus, it is more dicult for an attacker to
automatically generate variations of an executable that differ in their structure
than variations that map to different sequences of bytes.
For our purpose, the structure of an executable is described by its control
flow graph (CFG). The nodes of the control flow graph are basic blocks. An edge
[ Pobierz całość w formacie PDF ]

  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • jaczytam.htw.pl