September, 2003

Simulating Network Worms

Bruce Ediger

Introduction

This document describes the design and implementation of a network worm simulation system, named "NWS". It also describes using the NWS system to model aspects of real world worms, and the results of those simulations.

Definition of "network worm"

Currently, it seems unfashionable to strictly define "worm" or "virus" or any other perfectly good term. I think that commercial interests drive this - it's simpler and more profitable to threaten non-technical people with a "worm" than it is to address the real problems of a software and hardware monoculture. For purposes of defining the scope of NWS, I wanted to define "network worm".

For the purposes of this document a "worm" consists of a process or set of processes that replicates without human intervention by creating an executing copy of itself on another computer via some form of network communication.

Other similar definitions exist.

The Jargon File entry for worm says[1]: A program that propagates itself over a network, reproducing itself as it goes.

Alexander Bartolich[2] of the ELF virus writing HOW-TO has this definition: A worm is a program that penetrates other running programs. Penetration means to copy the executable code of the worm into the active process image of the host.

The definition seems to describe other forms of "mobile code" like Java applets. One has to realize that a worm spreads itself - a user has to ask to download a Java applet, so applets don't count as worms. Email viruses like Klez or SirCam also require human intervention. Before the virus activates, the user has to at least open the email carrying the virus, and possibly even open the attached document that contains virus code. The typical mass-mailer virus doesn't qualify as a worm by the definition above, despite the "worm" names given to mass-mailing viruses by anti-virus companies and mass media. Cliff Changchun Zou's work on email virus modeling[3] shows the differences between mass-mailer viruses and network worms.

The clause that says "creating an executing copy of itself ... via some form of network communication" allows us to distinguish between a fork bomb[4] and a worm. A fork bomb consists of many replicating processes on a single machine, without recourse to network communication.

Motivation for simulation

Various reasons exist for simulating the spread of network worms:

My own personal motivation consists of puzzling over the "installed base" theory. I usually encounter this theory in discussion forums like Usenet or Slashdot[8]. The theory runs like this:

Microsoft Product X has more worms than open source Product Y because Product X runs on more machines than Product Y.

I originally wrote the NWS simulator to try to test the "installed base" theory. In the course of writing NWS, I've decided that the "installed base" theory doesn't have any meaning, for at least two reasons.

  1. For some products (IIS and Apache), the theory is false - Apache runs on about twice[9] the number of web servers that IIS runs on.
  2. For other products (Outlook/Exchange), one can't falsify the theory. Unless some miracle caused reversed installed bases, no way exists to test the theory.

The installed base theory amounts to special pleading for the use of Microsoft products.

Other Simulators

Staniford, Paxson and Weaver[10], give sparse details on simulating a "Warhol worm".

Moore, Shannon, Voelker and Savage[11] seem to have modeled the Code Red epidemic with some effects of containment measures. They do not explain their simulator at all.

Zou, Gong and Towsley [12] modeled Code Red propagation, in an attempt to better match observed behavior. Apparently, they did their modeling using numerical solutions to differential equations.

Michael Liljenstam has written SSF.App.Worm[13], a module for the Scalable Simulation Framework[14].

Kephart and White [15] wrote a simulation of email virus spread. The simulator models networks with limited connectivity. They didn't fall prey to referring to such viruses as "worms".

Zou, Towsley and Gong [3] modeled email virus propagation using numerical simulations. They also make a distinction between "network worm" and "email virus".

NWS function

NWS provides a Perl programmer with a framework in which to write a complete network worm simulation. The programmer writes a Perl program using NWS objects and their methods. Running the program executes the simulation. NWS does require a degree of Perl knowledge, especially to write the worm code. It does not allow a user to eschew all intelligence by limiting him or her to actions or categories of actions I've already thought of.

Download

You can download the NWS simulator and try the simulations described in this document or use NWS to write your own. I developed NWS in a Linux command line environment, so the simulations print to STDOUT. Windows users may have a hard time getting these simulations to work, but they should work unchanged on all modern Unix, Linux or BSD systems that have Perl installed.

Licensing

The NWS network worm simulator is licensed under the GNU General Public License[16] (GPL).

Framework Design

I wanted to write a system that allowed simulation of many different real-life worms. When I began, I did not understand what this entailed. At a minimum, I believed that I needed flexible configuration and setup, to allow trying different population sizes (including number and "species" of worms) and address space sizes. I also chose to have NWS actually execute "worm code" in each entity a worm infects. Actually executing code allows a worm to perform arbitrary actions just like in real life. I wanted to make any given simulation as believable as possible.

Between setup and configuration and executing code for a worm, any simulator needed to include some kind of interpreted language. I chose Perl, mainly because I had an easy time writing early versions of an SIS epidemic in Perl. I wrote the NWS framework itself in the same language that NWS users write worms.

I also chose to write NWS in a simplistic object oriented fashion. Objects in an NWS program directly correspond to real life entities that comprise a real life network: hosts, software, and messages. This should make simulations much more believable, since a user can directly count uninfected hosts or worm hosts or messages passed.

Simulated worms contain actual code

Petri nets, or finite state machines or numerical simulations could simulate any given worm, but not allow the curious investigator to make simulations of some other type of worm, or more than one strain of worm in a simulation.

Numerical simulations don't easily allow discontinuities like dormant stages in a worm's life cycle, or introductions of counter-worms after worm infections reach epidemic proportions.

One can write a simulator for any given worm without executing "worm code", but that simulator wouldn't have any kind of general purpose capability.

Real life network worms execute arbitrary code once they've replicated themselves on a new host. The only way to model this generality allows simulated worms the same ability: simulated worms must execute some form of code. The NWS framework allows and encourages this.

Simulators that operate without executing worm code will end up corroborating specific cases written for and executed in a general purpose worm simulator.

Fully-connected network

NWS simulations model fully-connected networks, using "network" in a more graph theoretic sense rather than a TCP/IP sense. Every network-connected entity can send a message to any other entity in the simulation. Each network-connected entity has an address consisting of a single number. Writing a simulation requires an NWS user to set the address space size, the largest number that gets used as an address.

In contrast, mass-mailing Outlook viruses don't operate in a fully-connected network. Many mass-mailing Outlook viruses use Outlook's address book or SMTP client, while a few look through the infected machine's file system to find email addresses. Modeling a mass-mailer virus would involve deciding how many email addresses to allot to each instance of the virus, and deciding on a probability that represents how often a virus recipient clicks on the virus attachment. Not every instance of a mass-mailer virus would send itself to all the email addresses in the simulation, so the network of email addresses would not qualify as fully-connected.

Assuming fully-connected networks seems overly optimistic relative to the real Internet, given work such as Arbor Networks paper on dark address space [17] and the prevalence of corporate intranets connected to the Internet via gateways and firewalls that do network address translation.

Object oriented design mimics real networks

Naively, a network links entities called hosts. A host can contain instances of software that the host executes periodically. The network allows hosts to communicate with each other via messages or streams of messages.

The NWS framework demands that its users create objects that correspond directly to entities in a real network. NWS users have to create an instance of class NWS::Network, and populate the instance of NWS::Network with instances of class NWS::Host. NWS::Host instances contain NWS::Software instances.

Due to CPU architecture or operating system executing on the host, or even due to particular versions of software executing under the operating system, real life worms can only replicate themselves on certain hosts. Users of NWS can create instances of class NWS::Host that specify CPU architecture or operating system name. Instances of NWS::Host can contain instances of class NWS::Software that specifically name the "software" that the Host executes.

When NWS::Host instances receive NWS::Message instances, the NWS::Host delivers NWS::Message instances to NWS::Software instances. NWS::Host instances have numerical addresses in a network, analogous to Internet Protocol addresses. Inside an NWS::Host, names of NWS::Software instances constitute the analog to TCP or UDP port numbers. An NWS::Message gets delivered to a host by numerical address, but the Message gets ignored if the Message instance does not contain an identifier that matches the identifier of the Host instance receiving it. After a match on Host identifier, the Host will ignore the Message instance if the Message does not name a Software instance the Host contains.

This style probably has some detrimental effect on execution speed. I hope that this style promotes more believable simulations.

Use of Perl for both Framework and Worms

Users of NWS write simulation programs in Perl, incorporating and using NWS as a set of library modules. Users also write worm code for their simulations in Perl. The same programming language gets used for setup, configuration and for subjects under study in the simulation.

One possible alternative would have the simulator framework (network, network communications, execution scheduling, etc) written in a traditional compiled language like C. The worm code would exist as fragments of a scripting language, probably an easily embeddable scripting language like Tcl[18], Lua[19] or S-lang[20]. Other general simulators, like the ns[21] network simulator, have implemented exactly the "interpreted setup and configuration, compiled back-end" design. I decided against this approach.

Experience developing various worm simulations convinces me that NWS users will modify the simulator framework to suit their purposes. I often modified the simulator framework to serve special purposes like counting message receptions, or testing different message delivery algorithms. Writing NWS with a traditional compiled language "back end", and interpreted user code for setup, configuration and worm code burden one set of users, and handicap another. Some users will want to obtain different information from a simulation than the NWS framework provides by default. These users would have to change the "back end" code to collect the information they want while they simultaneously write the simulation in question. These users would have to write simulations in two vastly different programming languages at the same time - the "back end" and the "front end" languages. This two-language design would unnecessarily limit other users who only understand the interpreted user code.

Perl's Advantages

Many people have experience with Perl: Perl implementations exist for many or all modern systems. Lots of web sites and books offer documentation on Perl and how to effectively write Perl programs. This means that the potential audience of users of NWS ends up vastly greater than an alternative simulator design that requires its users to have familiarity with at least two languages.

Perl's Disadvantages

Using Perl for both worms and the simulator library code can make debugging difficult. NWS users may find it difficult to distinguish errors in worm code from errors in library code. Perl programs can use substantial amounts of memory, and they can execute slower than one would like. Users of NWS will never have the ability to model the entire Internet, as did Staniford, Paxson and Weaver[10].

Deviations from Reality

Any simulation discards elements of the real system it models. This causes simulations to behave differently than real life systems. People using the NWS simulator should realize that their simulations differ from reality in at least these specific ways:

I think that this combination of differences from reality prevents one from making any kind of correspondence between time steps in a simulation, and real-world time. It does not prevent using the NWS framework to compare qualitative differences. For example, an NWS user could check relative efficiencies of two different ways to probe the simulated network's address space.

Validation

Before trying anything difficult or interesting I attempted to validate NWS code and behavior.

Class and method level tests

I wrote simple test harnesses to execute as much NWS code as possible, and verify the code's behavior.

The NWS framework code passes all these tests. Users who modify the framework for their own purposes should consider running these test cases (and maybe others) to ensure they haven't broken the framework.

SIS model

SIS simulator code

The Susceptible-Infected-Susceptible model[22] constitutes one of the simplest mathematical models of an epidemic. Each entity in the population of the model resides in one of two states:

Entities makes transitions from one state to another according to some probability. Entities participating in biological epidemics can make transitions from susceptible to infected, and possibly from infected to susceptible, depending on the illness. Suffering through the common cold only confers a temporary resistance. Realistic models have to account for this.

If no transitions from Infected to Susceptible states take place, this model approximately matches any number of real life worms: the 1988 DECnet "Father Christmas" worm[23], 1998 ADMw0rm[24], the first arrival of Code Red v2[25], 2002 Slapper worm[26] (also here[27]).

Derivation of theoretical analytical solution

To validate the NWS framework, I need some kind of standard against which to measure it. Fortunately, SIS epidemics have an analytical solution. I'll derive most of the solution below in terms of a network worm and a population of exploitable hosts.

I use the following symbols:

NNumber of entities in the population
SAddress space size
PProbability of randomly picking an address that maps to an entity
VNumber of infected entities in population at some time
V0Number of infected entities in population at the start of the simulation

Assume that the worms pick addresses at random. The probability of picking an address that maps to a host in the population under consideration amounts to:

P = (number of hosts)/(address space size) = N/S.

The fraction of uninfected entities amounts to:

1 - V/N

At any given instant, V number of worms try to infect the remaining uninfected entities. Since the worms have a probability of infecting the remaining uninfected proportion of entities, and only so many uninfected entities remain, the change in the number of infected hosts amounts to:

P(1-V/N)V

A real set of infected hosts operating in the real Internet would send out probes continuously, so I get to magically wave my hands and convert the change in number of infected hosts to a differential equation:

dV/dt = P(1-V/N)V

This differential equation, known as the Logistic Equation, turns out to have an exact solution. The solution involves some onerous bookkeeping called integration by partial fractions. [28] The exact, analytical solution:

V = V0/[(1 - V0/N)e-Pt + V0/N]

This equation has some interesting properties. It has time starting at zero and going forwards. It works in terms of the total population, the initial number of infected hosts, and the probability of getting a "hit" on a member of the population.

If the Logistic Equation's solution matches reality, an investigator could reasonably work out when a worm-writer released a worm. One would only have to make a few assumptions about population of exploitable hosts, address space size, and how many worms got released initially.

Staniford, Paxson and Weaver's[10] "Hit-list Scanning" ends up as nothing more than a large value of V0 in this form of the solution to the Logistic Equation.

Simulation Description

I used an address space of 65535 (16 bits of address), and I populated it with seven worm NWS::Host instance, and 9993 possible victim NWS::Host instances, for a total of 10000 entities in the population. All entities reside at randomly assigned addresses in the address space. The simulation runs for 135 time steps or until only 1% of the 9993 possible victims remain uninfected. I will clarify the choice of seven initially infected entities below.

Comparison of NWS with analytical solution

The NWS simulation doesn't match the analytical solution particularly well. I believe the difference arises from stepwise treatment of time, and a discrete population. The NWS simulation takes discrete time steps, while the analytical solution has continuous time. Infections happen in whole numbers of entities per time step. The NWS model can only show 1 or 3 or 97 entities infected at a given time step, not 1.237 or 3.532 or 97.8 entities infected at 10.346 hours. NWS has inefficiencies (it takes longer to get to 99% infection) relative to analytical solution for these reasons.

Discrete Euler's Method solution

The analytical solution to the Logistic Equation provides an upper boundary to the NWS SIS simulation. I tried a cheap-and-dirty variant of Euler's Method[29] of solving differential equations to provide a lower boundary. I made the variant round down to the next lowest integer value of the number of entities infected over a given elapsed time.

Starting with the Logistic Equation:
dV/dt = P(1-V/N)V
I'll make the differential dV/dt into a ratio of finite quantities,
ΔV/Δt = P(1-V/N)V
If I use a small enough time step (numerical value of Δt) the Euler's method approximation should still remain accurate. Multiply both sides of the equation by Δt, and use int() function to mean "next lowest integer value":
ΔV = int(P(1-V/N)VΔt)

Euler's method has you calculate a value of ΔV based on the current value of V, P and Δt. The algorithm adds ΔV to the current value of V to get the next time step's value of V, an iterative solution:

Vi+1 = Vi + int(P(1-Vi/N)ViΔt

This necessarily undershoots the analytical solution for two reason. First, rounding down to the next lowest integer value of Δt provides the biggest loss relative to the analytical solution. Second, the iterative solution uses the slope of the function at the start of the time step to shoot the value of the function at the end of the time step. The analytical solution shows us that the curve only goes up, it never goes down, and the curve is concave for the first half of the simulation's run time. Using the slope at the start of the time step under estimates the value of the function at the end of the time step for the concave part of the curve.

The Euler's method solution graphed below started with values of V = 7, N = 65535, P = 10000/65535 = 0.1525 and Δt = 1 which match the NWS simulation. I chose the initial value of V so that the value of ΔV comes out to 1 during the first time step. The values of P and 1-V/N during the first time step mean that any smaller initial value of V will end up with ΔV = 0. This Euler's method simulation breaks down for initial values of V less than 7.

SIS Simulation Results

I ran the SIS simulation 20 times. I reduced the 20 simulation outputs to an arithmetic mean of the number of infected hosts at each time step. The graph below compares the mean of 20 SIS simulation runs with the analytical solution, and the results of running the discrete Euler's method numerical approximation derived above.

SIS model results

SIS Simulation Conclusions

The mean of 20 runs of the NWS SIS simulation reaches 99% infection slower than the analytical solution would have it, and falls reasonably close to the discrete Euler's Method numerical simulation. I take this to mean that the NWS SIS model correctly simulates a network worm turned loose in a population of computers without any resistance.

For small numbers of infected hosts, the real Internet shares the "discrete population" property with an NWS simulation. This seems to imply that an investigator can't use the analytical solution to the Logistic Equation to accurately find the time of an initial infection of a real Internet worm.

A worm author can take away two, opposing conclusions. Start as many worms as possible (a large hit list) to eliminate statistical effects on when a worm population gets traction, or start a single worm to randomly confuse the issue of when the epidemic started.

Cases of Greater Interest

Writing a slow, hard to use simulator that could only model worms that easier, faster methods can adequately model doesn't make sense. This section illustrates a few cases that more or less match real world worms, and don't have an analytical solution available.

CRclean

NWS simulator code
Simple numerical simulation code

Epidemiologists also consider another model of infection: Susceptible-Infected-Recovered[30]. Observation of biological epidemics like chicken pox or small pox, diseases that confer lifetime immunity on the survivors, motivate this model. In an SIR epidemic, members of the population reside in one of three states:

Members of the population make transitions from one state to another according to some probability. The SIR model limits transitions to

In computer terms, one might describe the states as vulnerable, exploited and patched.

During the 2001 Code Red worm outbreak, Markus Kern posted code for a passively spreading worm[31]. I don't have much knowledge about IIS in particular, and Microsoft products in general, but I gather that when a CRclean-infected instance of IIS receives a Code Red URI, CRclean sends a copy of itself to the Code Red-infected-host. The newly-sent-CRclean infects the Code Red-infected-IIS with itself, stops Code Red and patches the previously-infected host to prevent re-infection by Code Red. CRclean then starts watching for receipt of Code Red URIs.

The interaction between IIS, Code Red and CRclean matches an SIR epidemic pretty well. We would consider machines running IIS (but not infected with Code Red or CRclean) as "susceptible". After infection with Code Red, we would consider a machine "infected", and after infection with CRclean, we would consider the machine "recovered".

Derivation of differential equations

Despite asserting above that this is an interesting case, one that doesn't have an analytical solution, I will derive differential equations to describe the IIS/Code Red/CRclean interaction. The differential equations that I derive don't have an analytical solution, I can only solve them numerically. Since numerical solutions to systems of differential equations are notoriously poorly behaved, deriving a system of differential equations for this case that may or may not even converge on an answer doesn't render it trivial, suitable only for validating the NWS framework. Differential equations, solvable or not, also give insight into the problems they describe that simulations do not.

I'll use the following symbols to derive differential equations that describe an epidemic of Code Red among a population of IIS hosts, and automatic patching of the infected hosts by CRclean.

NNumber of entities in the population
PProbability of picking an address that maps to an entity
SNumber of susceptible (uninfected) entities in population at some time
INumber of infected entities in population at some time
RNumber of recovered entities in population at some time

First, the number of susceptible, infected and recovered entities has to sum up to the total number of entities in the population:

N = S + I + R

Next, the differential equation for the rate of change in recovered entities. Infected entities (Code Red worms in this simulation) probe the address space randomly, and they have a probability P of picking an address mapping to an entity. Since I infected entities exist in the simulation at any given time, PI represents the number of probes that hit any entities in the simulation. The number of CRclean entities equates to the number of recovered entities in this simulation. The fraction of recovered entities in the simulation amounts to R/N. so the number of probes from Code Red entities that hit a CRclean entity is PI(R/N). Each Code Red probe to a CRclean entity results in one less Code Red entity, PI(R/N) amounts to the per-time step change in recovered entities, and -PI(R/N) amounts to the per-time step change in the number of infected entities. Since these equations model a continuous system, we can wave a magic wand over it and convert it to a differential equation for representing the rate of change in number of recovered entities:

dR/dt = PI(R/N)

The change in number of infected entities has a similarity to the SIS model: PI constitutes the number of infectious probes that hit an entity in the address space of the simulation. Since only a fraction of the number of entities that get hit are susceptible, the change in number of infected entities due to conversion of susceptible entities amounts to PI(S/N)

This SIR model differs from the SIS model in that some infected entities change to recovered. That change amounts to -PI(R/N).

We add the two changes in number of infected entities, and once again invoke the magic of continuous time and a large population to get a differential equation:

dI/dt = PI(S/N) - PI(R/N)

That leaves us with three equations for the three unknowns (S, I, R):

Euler's Method.

I don't think that an analytical solution for the three equations above exists. One can write a numerical simulation for them, however. Assuming differentials constitute ratios of finite quantities and that using small (but not infinitesimal) values of dt approximates the analytical solution, you can write an iterative solution.

The values of S, I and R at a given time step get used to calculate the values for the next time step.

Simulation Description

To compare the NWS simulator with an Euler' method solution for an SIR epidemic, I wrote and ran an NWS simulation of CRclean versus Code Red. I used an address space of 65535 (16 bits), and I populated it with seven NWS::Host instances that run NWS worm code, modeling the start of the Code Red 1 v2 worm. I put in 9986 possible victim NWS::Host instances, modeling IIS hosts, all of them infectable by Code Red v2. All worms and victims reside at randomly assigned addresses.

To make this simulation different from the SIS model above, I put in seven NWS::Host instances that contain code for a simulated CRclean worm. This differs from reality. To my knowledge, no one deployed CRclean.

I used seven each of Code Red 1 v2 and CRclean to give the simulated Code Red worms a good probability of hitting a simulated IIS host during early time steps. This should reduce the differences between the Euler's Method numerical simulation and the NWS simulation.

The Euler's method numerical simulation starts with N = 65535, S = 9986, I = 7 and R = 7. It sets the probability of picking a host at random to P = 10000/65535. It runs for 150 time steps.

SIR model results

The NWS simulation of CRclean matches the Euler's Method numerical solution very closely, far closer that the NWS SIS simulation matches an Euler's method numerical simulation. I believe this results from the fact that a single differential equation can describe an SIS epidemic, while an SIR epidemic requires a system of three differential equations. An NWS simulation of an SIS epidemic "leaks" relative to the Euler's Method solution, in that NWS can't infect fractional numbers of entities. Any time the Euler's Method simulation infects something other than a whole number of entities, at the corresponding time step the NWS simulator only infects the whole number of entities. Any fractional number of infected entities leaks out of the system, since the fractional part gets added to no other quantity. The three equations used to model an SIR epidemic don't "leak" fractional parts of entities out of the system: any fraction of an entity not included in one quantity gets included in another.

CRclean conclusions

It looks like CRclean would have worked against Code Red, despite not doing any scanning on its own. More generally, it looks like a "passive" worm, a worm that merely responds to legitimate requests, instead of actively scanning for vulnerable hosts, would work, provided the population of vulnerable hosts act as clients to the passive worm's service.

Competing Worms

Two Code Red worms[25] existed and competed for the same set of vulnerable Windows hosts running IIS. For a period in August and September of 2001, Code Red II out-infected Code Red V2. In this section, I use NWS to explore some possibilities of why one worm might out-compete another worm for the same population of infectable entities.

The case of two or more worms that use the same exploit has happened in the real Internet at least twice. Code Red I v2 and Code Red II in 2001, and the many "Slapper" variants[32] in 2003.

I won't address the different probing strategies used by Code Red I v2 and Code Red II - the NWS framework does not currently allow simulating sub-nets, so simulating Code Red II's preference for probing IP addresses close to its own doesn't make any sense. Some other properties can be addressed.

Variation in probe rates

Simulation code

This set of simulations has an address space of size 65535, in which seven each of two, very similar worms get started. The simulation also contains 10,000 infectable hosts. For verisimilitude, I named the worms "Code Red I v2" and "Code Red II", and I named the susceptible hosts "IIS".

The simulation executes two very similar worms. The worms carry nearly identical code, differing in only two aspects. The "true name" that each worm gives to a newly-infected host differs, one set of worms uses "Code Red I v2", the second set uses "Code Red II". The two worms also differ in that "Code Red II" worms can send an extra probe to a random address some percentage of the time steps in which they execute. The "Code Red II" worm code uses a random number to decide whether to send the extra probe, so the percentage works out only on an aggregate basis - some worm instances may never send the extra probe, some instances may send it every time step. This extra probe simulates a slightly faster worm or a worm that communicates infection slightly faster.

I ran the simulation 20 times each with an extra-probe-percentage of 0, 1, 5 and 10. The following table show the mean number of infected hosts for each of the extra-probe-percentages.

0% 1% 5% 10%
Code Red I v2 5061.9 4408.95 2726.35 1273.5
Code Red II 4951.1 5604.05 7286.65 8739.5

Even a small increase in number of probes per time step gives a population of worms a big advantage against a slightly slower population.

One worm fixes vulnerability

Simulation code

In this variant, "Code Red II" worm code marks its host as not vulnerable just after exploiting that host. This action prevents re-infection from other "Code Red II" hosts, and conversion to "Code Red 1 v2".

Worms with the true name "Code Red 1 v2" don't perform the extra work of marking the newly-infected NWS::Software instance not vulnerable. Other "Code Red 1 v2" worms can re-infect it, and "Code Red II" can infect it.

Both worms probe at the same rate, once per time step.

The simulation has an address space of 65535. I had the simulation place 10000 victim hosts randomly in the address space. I had the victim hosts contain NWS::Software instances using the identifier and true name of "IIS". At the top end of the address space (address 65525) I had the simulation place an NWS::Host instance containing worm code that upon infection, "fixes" the vulnerability it used to replicate itself. I had this worm use the true name of "Code Red II".

At the bottom end of the address space (address 10) I had the simulation place NWS::Host instance containing worm code identical to the first worm code, except that it doesn't fix the vulnerability. Instances of this worm leave the vulnerability unpatched. I had this worm use the true name of "Code Red I v2".

I ran the simulation 20 times, each time executing 100 time steps, producing the data in the table below.

Worm true name mean hosts infected per run median hosts infected per run
Code Red I v2 64.1 35
Code Red II 9888.8 9918

Fixing the vulnerability that initially gave access to the susceptible host confers an enormous advantage on a population of worms competing for a set of susceptible hosts that exhibit a given vulnerability.

Competing Worms Conclusions

For the case of two worms competing for the same population of susceptible hosts, worm authors should strive to write worms as "virulent" as possible.

Worms should patch the vulnerability they used to replicate themselves. It appears that all other things staying the same, having a worm fix the vulnerability it exploited to replicate gives more advantage than a 10% performance boost.

Permutation Scanning worm

Permutation scanner simulator code
Algorithmic permutation code

This simulation models the hypothetical Permutation scanning worm described in How to 0wn the Internet in your spare time[10], in a section titled "Permutation Scanning". Staniford, Paxson and Weaver wrote about a simulation of a permutation scanning worm, but I believe they missed some key points, developed below.

Permutation scanning worms don't select addresses to probe randomly: they select addresses algorithmically. Each worm has an index, and an algorithm that maps the index value to an address. By incrementing the index, they calculate different, non-sequential addresses to probe. If the worms use a good quality algorithm to calculate address, the addresses generated from successive indexes are quite different: the series of addresses probed looks very close to random. All worms use the same algorithm to generate addresses, but individual worms pick random initial index values at infection time.

A worm that starts late will likely pick an index in a range that an earlier worm has already probed. The late-created-worm will end up probing a series of addresses that some other worm has already probed. To avoid re-probing, worms that try to re-infect previously infected hosts go inactive in the Staniford/Paxson/Weaver scenario. The worm code in the permutation scanning simulation accomplishes this by replying to any re-infection attempts. Individual re-infecting worm instances go inactive when they receive a reply.

One can spoof worms that use this type of scanning. An immune spoofer host could respond that the worm has already infected it, causing permutation scanning worms that probe the spoofer to go inactive. This would protect all hosts further in the permutation than the spoofing host. Real life worms should require two or more consecutive "already infected" replies to go inactive to prevent spoofing. The simulated worm represents the best case of a permutation scanner, with respect to optimism about reinfection.

The permutation scanning worm of this simulation uses an algorithmic permutation to probe address space. A DES-inspired algorithm combines a 32-bit key with a 16-bit index to determine the address to probe. Worms start with a random index, then generate an address to probe by using the permutation function on the index. Worms increase the index sequentially. Hopefully, the series of addresses calculated by permuting sequential indexes look random. All worms use the same permutation by using the same 32-bit key.

The simulation runs in an address space of size 65535 (16 bits). It starts with 7 permutation scanning worm and 9993 victims. The victim NWS::Host instances reside at randomly assigned addresses, as does the single initial worm instance. The simulation runs for 135 time steps or until only 1% of the 9993 possible victims remain uninfected. This allows for comparison to the SIS model random probing worm described above. I ran the permutation scanning worm simulation 20 times, and calculated mean counts of infected hosts at every fifth time step, shown in the graph below.

Random probing vs Permutation scanning

Permutation scanning worms and random probing worms take exactly the same number of time steps to reach 100% infection.

Permutation scanning produces fewer messages to get to 100% infection than a population of random probing worms does, and a smaller mean number of probes per host. Only active permutation scanning worms send messages, and the proportion of active permutation scanning worms rises, then drops to nearly zero by 100% infection. To contrast, every member a population of random probing worms produces messages all through the course of a simulation.

One area of difference exists: the distribution of probes over the population of entities in the simulation. When you count the probes per host, the population of permutation scanning worms produces a Zipfian distribution - almost 45% of the hosts in the simulation receive 2 probes, abut 26% receive 3 probes, 13% receive 4 probes and so on. The number of hosts receiving a certain number of probes amounts to almost exactly half the number of hosts that receive one more probe. The typical host receives far fewer scans for the optimistic permutation scanning worm, than for a random scanning worm.

The random probing worms produce a Poisson distribution of probes per host, exactly what you'd expect for a random process.

The property of not probing most hosts very often might make discerning a permutation scanning worm more difficult,

Random probes and permutation probes histogram

Permutation scanning simulation conclusions

I conclude that permutation scanning doesn't give a worm an overall advantage relative to random probing. A permutation scanning worm doesn't have any speed to 100%-infected advantage. It has a lot of disadvantages. I found it difficult to write an algorithmic permutation of a 16-bit address space that gives a random look to the permuted indexes. Depending on the vulnerability and the communication protocol involved, firewalls or NAT can prevent infected hosts from responding to repeat infection attempts, negating any advantage conferred by lowering traffic - worms that should go inactive instead keep probing. Permutation scanning has trouble with NAT-ed subnets, too. Since a given set of worms all use the same permutation, a worm instance that obtains access to a NAT-ed subnet host will almost certainly skip over most or all of the other hosts on the NAT-ed subnet, at least until an address in the subnet turns up as a permuted index. I conclude that the Internet will probably never see a permutation scanning worm that becomes epidemic.

Cheese vs 1i0n

The 1i0n[33] Linux worm of 2001 prompted someone to write an anti-worm: Cheese[34]. Beyond obvious differences like CRclean attacking IIS and Cheese exploiting a back door TCP server left behind by the 1i0n worm, significant differences exist between them. One significant difference between Cheese and CRclean consists of the method of finding hosts infected with the original worm. Cheese actively sought out back door TCP servers left running by 1i0n worms. CRclean waits for a Code Red URI to arrive, and attempts to infect the sender of the Code Red URI. Does active searching for hosts infected by some other worm make a difference?

The same differential equations as describe the CRclean model apply here. The derivation of the differential equations ends up with different semantics for the change in recovered entities equation. The Cheese worm actively probed for 1i0n back door servers. The probability of a Cheese worm probing an address that maps to a host remains P, but the number of Cheese worms is I, and the fraction of the population infected with 1i0n is I/N. For Cheese, the differential equation for change in number of recovered entities is:

dR/dt = PR(I/N)

The PRI/N term equates algebraically to the PIR/N in the CRclean system of differential equations. The same numerical simulation can describe both CRclean/Code Red/IIS interaction and the Cheese/1i0n/BIND interaction. The differential equations give the insight that CRclean and Cheese should observe fundamentally similar limits on spreading.

Simulation Description

I used an address space of 65535 (16 bits), and I populated it with seven worm NWS::Host instances containing code representing the initial number of 1i0n worms. I also populated the address space with seven NWS::Host instances containing code representing an initial number of Cheese worms. I put in 9986 possible victim NWS::Host instances modeling hosts running BIND, the software exploited by 1i0n worms. All NWS::Host instances reside at randomly assigned addresses in the simulation's address space.

For validation purposes, the simulation runs for 150 time steps and all Cheese worms and 1i0n worms get their start at time step 1.

Cheese worm simulation compared with Euler's method

This Cheese worm simulation matches the Euler's method numerical solution about as well as the CRclean worm simulation did.

Network Traffic Comparison

The set of Cheese worms send out many more probes than the set of CRclean worms. CRclean worms only respond to Code Red probes, while Cheese worms actively probe for 1i0n back door software. The set of CRclean worms produces fewer and fewer response messages as the number of Code Red worms decreases, while the set of Cheese worms produce more and more probe messages as their population increases.

Cheese Simulation Conclusions

If you must write an anti-worm, write it in the "passive" style of CRclean, rather than the "active" style of Cheese. Passive worms produce less network traffic of their own.

msblast - effect of banded address space

msblast simulator code
Random probing worm simulator code

The msblast[35] worm appeared in August of 2003. It took advantage of a buffer overflow in an RPC system that apparently existed in nearly every Windows NT-based operating system.

The msblast worm had an unusual method[36] for picking addresses to probe for susceptible hosts: each worm picked a random IP address upon infecting a host, then sent probes sequentially up from the random address.

Given that network administrators have allocated addresses inside the Internet Protocol address space in bands[37], perhaps the author of msblast thought that sequential probing would allow faster infection rates. I wrote an NWS simulation to test msblast's probing strategy against victim hosts placed in bands in the address space, and against victim hosts placed randomly throughout the address space.

Simulation Description

The real question amounts to "what difference does an address probing strategy make if the addresses of susceptible hosts are arranged in groups rather than scattered randomly?" To test this, I wrote 4 different simulations:

In each of the four simulations, I used an address space of 65535 (16 bits of address), and I populated it with seven worm NWS::Host instance, and 9993 possible victim NWS::Host instances, for a total of 10000 entities in the population. In all four simulations, the initial seven worm entities get randomly selected addresses.

In two of the simulations, the victims also get randomly selected addresses. In the other two simulations, the victims get placed sequentially in nine "bands", all bands one-ninth of the address space apart. Each band contains one-ninth of the victim hosts.

For one simulation each of random and banded placement of victim hosts, I modeled a random address probing worm, like Code Red v2, or Slapper. Each time it gets executed, a random address probing worm picks an address in the simulation's address space at random. It tries to infect that address.

In the second set of two simulations (banded placement of victims and random placement of victims), I modeled the msblast worm. The msblast worm code picks an address at random when it infects a previously susceptible host. Each time the worm code executes after that, it probes the address, then increments the address. Written an alternate way, it probes sequentially up the address space. I had the code wrap around to address 1 if and when it gets to the maximum address in the simulation.

The four simulations run for 135 time steps or until only 1% of the 9993 possible victims remain uninfected.

I ran the simulations 20 times each. I reduced the 20 simulation outputs to an arithmetic mean of the number of infected hosts at each time step. The graph below compares the mean of 20 simulation runs for each of the four simulations

msblast model results

msblast probing conclusions

A worm that probes an address space randomly reaches 100% infection just as fast for both victim host layouts, random and banded. The msblast-style probing worm performed differently for the two victim host layouts.

For a set of victims with randomly-allocated addresses, the random probing worm dramatically outperforms msblast-style sequential probing.

For a set of victims in nine bands, the random probing worm performs identically to the random probing worm with a set of randomly positioned victims. The msblast-style sequential probing performed almost as well.

I hypothesize that the author of the real msblast had access to a network with sequentially-allocated addresses for testing. The sequential probing msblast worm worked fairly well on the network used for testing, maybe even better than random probing, so the msblast author chose an address selection algorithm that didn't work very well in the real Internet.

I made the choice of nine bands fairly arbitrarily. Some possibility exists that more or fewer bands, or even a more densely-populated address space make the msblast address selection algorithm more efficient than random probing.

Programming guide

Installation

I assume that users of NWS will make modifications to the library modules that implement the NWS framework. It seems useless to install NWS as an official Perl module when the users will almost certainly have customized copies of the NWS modules in the directories of specific simulations.

To install NWS, you create a directory into which you unpack the distribution with a command similar to gunzip -c nws.tar.gz | tar xf -. All the example simulations from this document unpack as well. Every example simulation gains the use of the NWS framework by including four Perl "use" directives:


use NWS::Network;
use NWS::Message;
use NWS::Host;
use NWS::Software;

If you create your own simulation, include the same four Perl "use" statements.

This boils down to having a directory named NWS located in the directory in which simulation code resides. The NWS subdirectory must contain the four Perl modules (Network.pm, Message.pm, Host.pm, Software.pm) that implement the NWS framework.

Simple Worm Example

Simple worm code

An example of a very simple worm will help the reader understand the NWS simulator. The following 28 lines of Perl illustrate how to write a simulator for a simple Susceptible-Infected-Susceptible model.


     1	#!/usr/bin/perl
     2	use NWS::Network;
     3	use NWS::Message;
     4	use NWS::Host;
     5	use NWS::Software;
     6	
     7	my $worm_code = q{
     8		my ($host, $software, $phase) = @_;
     9		$host->SendMsg(
    10			Message->new(
    11				$host->RandAddress,
    12				'OS-CPU',
    13				'Victim',
    14				$software->{code})
    15		) if $phase == $Host::Run;
    16		$software->{true_name} = 'Worm' if $phase == $Host::Init2;
    17	};
    18	
    19	my $network = Network->new(65535, 0);
    20	$network->AddHost(Host->new('OS-CPU', 0, Software->new('Worm', $worm_code, 0)));
    21	$network->AddHost(Host->new('OS-CPU', 0, Software->new('Victim', '', 1)))
    22		for (1..10000);
    23	
    24	for (my $i = 1; $i < 135; $i += 5) {
    25		$network->Run($i, 5)->PrintCounts;
    26	}
    27	
    28	exit 0;

Simple worm output

When invoked from a Linux command line prompt, the simple worm simulator produces output that looks like this:


 4:16PM 642 % ./simple.pl
#step, msg count,       Victim, Worm
5       9       9999    2
#step, msg count,       Victim, Worm
10      10      9999    2
#step, msg count,       Victim, Worm
15      15      9998    3
#step, msg count,       Victim, Worm
20      23      9994    7
#step, msg count,       Victim, Worm
25      52      9989    12

The NWS::Network::PrintCounts method produces two lines of output each time that a program calls it. The first line begins with the '#' (pound sign) character, and lists the last time step number, the number of messages sent since the last call to NWS::Network::PrintCounts, followed by true names of the NWS::Software instances in the simulation.

The second line contains a tab-separated list of numbers, counts of the same true names of NWS::Software instances that appear on the first line.

This output format served me well, allowing interactive and batch graphing using Gnuplot. It may not serve others as well. NWS::Network::PrintCounts does not use any information other than that contained in NWS::Network instance variables, so any NWS user can re-write PrintCounts to suit him or her self.

Classes

NWS::Network

Instances of NWS::Network model all the wiring and routers and gateways that compose a single real network.

A program using the NWS simulation framework should only bother to create a single instance of class NWS::Network. The instance of Network contains and connects instances of class Host. The Network instance calls NWS::Host::RunProcess method on each contained instance of NWS::Host, which can cause a NWS::Host to send a NWS::Message instance to some other NWS::Host contained by the NWS::Network instance. NWS::Network instances deliver NWS::Message instances around the simulated network.

NWS::Network Methods

Letting the AddHost method prefer to place entities at random addresses in a simulation's address space may constitute a flaw. This preference may lead NWS users to assume that the real Internet, or even a corporate intranet, has a randomly-addressed population. Simulations that don't match reality closely may result.

NWS::Network Instance variables

The last two instance variables mainly have use in determining if a given simulation correctly sends and receives enough messages. For example, over a large enough number of time steps, randomly addressed messages should hit enough hosts to make the ratio of total_hit_count to total_msg_count equal the ratio of number of hosts to address space size.

NWS::Host

Instances of class NWS::Host represent computers attached to some form of a network. Each NWS::Host object has a numerical address associated with it by the NWS::Network object in a simulation, and can contain zero or more NWS::Software instances.

NWS::Host Methods
NWS::Host Instance variables

Using msg_cnt correctly can allow you to keep track of the efficiency of a given address probing scheme.

NWS::Software

Instances of class NWS::Host can contain instances of class NWS::Software. NWS::Software instances have a name, so that an NWS::Host can contain more than one instance of NWS::Software. This situation models real computers, which often run hundreds of different processes at any given instant.

Instances of NWS::Software can contain actual code - fragments of Perl code that get actually compiled and executed when the simulation's NWS::Network tells the NWS::Host objects to run their processes.

NWS::Software Methods

Example NWS::Software constructor use:


	my $annunciator_code = qw{
		my ($host, $software, $phase, $msg) = @_;
		print "Host at ", $host->{address}, " called during phase ",
			$phase, "\n";
	};
	my $sw = Software->new(
		"Call phase Annunciator",
		$annunciator_code,
		0
	);

The string naming the Software gets used to decide whether or not to accept a given NWS::Message object sent to a Host. The fragment of Perl code actually gets compiled (via an eval BLOCK; construct) and runs during simulation execution. The number indicates whether the Software is vulnerable (1) or not (0).

The constructor method compiles any Perl fragment passed as an argument, and it sets identifier and true_name instance variables to the value of the first argument of the constructor. After that, the constructor calls the compiled Perl fragment with the $Host::Init phase argument. Worm code should change the true_name value at this point.

NWS::Software Instance variables

NWS::Message

Instances of class NWS::Message model frames or packets, and constitute the hunks of information that NWS::Host objects can pass between each other. The NWS simulations act as if the Internet mainly communicated using UDP/IP instead of TCP/IP.

NWS::Message Methods
NWS::Message Instance variables

An entity sending an NWS::Message only has to fill in destination, identifier and software for the Message to pass through the delivery algorithm.

The code instance variable is the means by which a worm passes executable content to another entity in the simulation. Usually, a programmer uses the NWS::Software::function contents to fill in code, but this isn't enforced.

Worm Code

Users of the NWS simulator see worm code as text, more specifically as fragments of the Perl programming language. At runtime, an NWS simulation creates instances of class NWS::Software. If a programmer passes a Perl code fragment to the constructor of NWS::Software, the code fragment actually gets compiled via an eval statement. The instance of NWS::Software so constructed carries around a reference to the compiled code fragment, analogous to a function pointer in other programming languages.

Example Worm Code

The following 26-line fragment of Perl constitutes a fairly sophisticated worm. When executed in an NWS simulation, it sends infectious messages to random addresses. If it receives an infectious message from another worm instance, it replies back. If it receives a reply to an infectious message, it marks itself inactive, and sends no further messages.


     1	my $worm_code = q{
     2		my ($h, $sw, $c, $m) = @_;
     3		if ($c == $Host::Run and $sw->{active}) {
     4			$h->SendMsg(
     5				new Message
     6					$h->RandAddress,
     7					"OS-CPU",
     8					"Victim",
     9					$sw->{function}
    10			);
    11		} elsif ($c == $Host::Recv) {
    12			if (!$m->{already_infected}) {
    13				$m->{destination} = $m->{source};
    14				$m->{already_infected} = $h->{address};
    15				$m->{code} = 0;
    16				$h->SendMsg($m);
    17			} else {
    18				$sw->{true_name} = 'Worm - inactive';
    19				$sw->{active} = 0;
    20			}
    21		} elsif ($c == $Host::Init2 or $c == $Host::Init) {
    22			$sw->{true_name} = 'Worm - active';
    23			$sw->{exploitable} = 0;
    24			$sw->{active} = 1;
    25		}
    26	};

If there is a trick to writing worm code, it's to realize that you have to write code that's not only thread-safe, but also re-entrant.

When worm code runs

The code contained by instances of class NWS::Software gets executed in four different situations.

  1. Just after an NWS::Host adds the software, in method NWS::Host::AddSoftware
  2. Once each time step during normal network operation. An NWS::Network instance calls NWS::Host::RunProcess on each NWS::Host instance it contains at every time step.
  3. When a NWS::Message object makes its way from one NWS::Host to another. If the receiving NWS::Host finds the message appropriate to its "operating system" and "CPU" it passes the incoming message to the appropriately named NWS::Software instance. The NWS::Software's code gets executed, and the NWS::Message instance gets passed to the code.
  4. Just after exploiting an NWS::Software instance.

NWS::Software instances execute the code once on construction to allow the code to initialize run-time constants rather than recalculating the constants at every execution. Similarly, an NWS::Host causes the execution of any code an exploiting NWS::Message might contain just after the exploitation. This allows worm code to set up and initialize. Running code in NWS::Software instances that receive a NWS::Message allows NWS::Hosts to cooperate, or it allows writing worms like CRclean, that style "strike backs" when they receive a particular NWS::Message.

Code for a worm doesn't have to execute anything in any of the four situations. In the simple worm example section, line 15 constitutes the test for execution of the worm code during normal network operation. Line 16 tests for execution just after exploitation. The simple worm code in the example gets executed in the two other situations (on message reception and at installation), but does nothing.

Worm code calling convention

When an NWS simulation executes the NWS::Network::Run method, any compiled code fragments get called. The arguments to the compiled code fragment look like this:

  1. NWS::Host instance currently causing the code fragment to execute
  2. NWS::Software instance that contains the code fragment
  3. A numerical code indicating the "phase" of execution. The four phases symbolically:
  4. For phases Host::Recv and Host::Init2, the relevant NWS::Message object

A Message object doesn't always appear in the worm code's argument list. When a Message appears, the phase of execution determines what the Message contains. If the phase argument equates to $Host::Init2, the Message is the infectious communication that caused a new replicated worm instance to start. If the phase argument equates to $Host::Recv, the Message argument came from a worm instance executing in some other Host in the network.

Worm code history

I believe that recounting what I did to arrive at the current arrangement of worm code and its calling conventions would help the reader to understand how to write better worm code. Initially, I had every NWS::Software instance carrying around worm code as a string. Every time a NWS::Software instance got called upon to execute its code, an eval of the string took place. I did this because I didn't have a good idea of what information (including what objects) worm code would need access to in order to act realistically. This technique worked, but it did cost a lot of execution time to eval a string in every worm during every time step.

My second effort involved having NWS::Software instances carry around worm code as a string, and additionally carry the worm code around as a pointer to a closure (a dynamically compiled anonymous function). Well-known names for information important to a worm's execution ($host, $software) existed when the code-as-string got compiled to a closure. Worms would send the code-as-string to potential targets. Upon exploitation, targets compiled the code-as-string to their own private closures. This provided a fair performance increase, since eval didn't happen for every worm for every time step, and it also narrowed the information available to any given worm to well-known variable names present when strings got evaled to closures.

To allow for simulation of different worm behavior, I had added arguments to calls to the worm code: a number indicating the "phase" of execution, and the message just received. I noticed that only a few well-know variables got used in even the most elaborate simulations: references to the NWS::Host and NWS::Software that contained the worm code.

I also realized that the simulations spent large amounts of time compiling the worm code to a closure ever time a worm exploited a vulnerability. This caused me to change from compiling a closure for every infected NWS::Software instance to compiling a single closure. Instead of passing worm-code-as-text, the NWS::Message instances now contain a pointer to the appropriate closure. Instead of having well-known variables available to the closure at compile time, I changed the worm code calling convention to include exactly those well-known variables as arguments. This brought down execution time and overall memory usage of any given simulation.

Further Work

Gatewayed network simulation

Simulate Network Address Translation gateways and firewalls by including code in class NWS::Network that allows one instance of NWS::Network to contain a second instance of NWS::Network. The second instance should pass messages appropriately by translating addresses, and contain a second set of NWS::Host instances.

One could use such an enhanced NWS::Network class to simulate Code Red II's spread behind firewalls and gateways.

Rate limited devices

The current NWS simulation framework includes nothing with a performance problem. Real life networks have devices like firewalls or routers or specific software that suffers during a network worm epidemic. Rate limited devices or software end up getting bogged down by worm-related traffic, and either take a long time to deal with legitimate traffic, or drop legitimate traffic entirely.

Adding rate-limiting bottlenecks to a network simulation would allow the use of NWS as more than a worm design tool. Users could develop a worm and a network simulation without bottlenecks, than add or turn on bottleneck objects to see where the worm causes network failures.

Rate limiting could take the form of a limit on how many NWS::Message instances can pass in a time step, whether the remaining messages get queued or discard, or it could take the form of shutting down if the number of Messages to pass gets too large.

A combination of rate limited devices and gatewayed networks could help to confirm Cliff C Zou's two-parameter Code Red model [12].

Bibliography

Development Environment

I did not use large numbers of tools, or an expensive hardware setup to develop NWS.

The computer running SuSE 7.3 linux had a 1 GHz AMD Duron processor on a Micro ATX MS-6340M motherboard, with 256 Mb of memory. This computer ran 20, 135-time step NWS simulations in about 15 minutes wall clock time.