Tuesday, March 30, 2010

Should You Trust Your SSL Certificate?

Is your SSL secure? I'm not sure that the answer is quite so binary after reading the paper Certified Lies: Detecting and Defeating Government Interception Attacks Against SSL by Christopher Soghoian and Sid Stamm. In a so-called compelled certificate creation attack, a government agency forces a CA to issue a fake certificate to intercept an SSL-encrypted (or more precisly: TLS-encrypted) session without triggering a warning in the victim's browser. This works seamlessly, because it does not matter who issued a specific certificate as long as the browser sees a valid chain of trust terminated with a known root certificate. Although the authors claim that this attack is exercised in practice, data-driven evidence is yet lacking.

Proactive Protection

In order to assess whether or not one's current TLS session is being intercepted, the authors have developed a Firefox extension called CertLock, which extends the browser history by collecting additional certificate information such as its hash, the name and country of the issuing CA and the website, and the trust chain up to the root CA. Each time the user revisits a TLS-protected site for which a certificate in the history exists, CertLock compares their two hash values. If a mismatch is detected, CertLock next compares the issuing CAs' country. In the event that they differ, CertLock presents a warning to the user; otherwise the page loads without any warnings.

Although CertLock alleviates some problems with the compelled certificate creation attack, it is still limited in the following ways.

Trust-On-First-Use

When visiting a TLS-protected website for the first time, the user immediately faces a dilemma: how to trust the certificate chain when there is nothing to compare to. Because CertLock operates by comparing the current certificate with one from the history, it cannot detect if the first encountered certificate chain is authentic. That is, CertLock can only protect the user for sites that have already been visited in the past and deemed secure.

When planning a trip to a potentially hostile environment (e.g., defcon or China), a common recommendation is to use a fresh laptop, or at least to replace the laptop's harddrive and start over with a fresh system installation. This is precisly the scenario where CertLock would be necessary, but cannot function due to the lack of browsing history.

Ground truth

A well-known issue with anomaly detection is the presence of an adversary during the training phase, who can later conduct unnoticed attacks when the system is in live operation. CertLock cannot distinguish counterfeit from real certificate trust chains when trained maliciously. In fact, the opposite of the intended behavior may occur: a benign certificate may be misclassified as untrustworthy, and a forged certificate may be blindly accepted. This problem highlights the need for a trustworthy past certificate history, otherwise it is impossible to make an accurate decision.

False Negatives

CertLock suffers from false negatives when (i) the actual and compelled CAs are from the same country and (ii) the certificate differs from the one in the history but the issuing CA has not changed. At the same time, the number of false positives is greatly reduced this way, which is vital to get the user's attention in this scenario.

User Failure

When I asked two of my friends what their impression was of CertLock's displayed warning, I was surprised to hear "it looks like spam - what does Russia have to do with the site I am visiting?". Granted, the authors acknowledge the room for improvement on the user interface, and their intention to brush up the warning's design. However, given the population's limited geographic understanding of international relations, this may be a challenging task.

Retrospective Analysis

In addition to the aspects above which highlight the need for better proactive defenses, opportunities for forensic analysis are equally important to appreciate. Network traffic traces may contain evidence of the compelled certificate creation attack. Bro already supports the extraction of certificates which minimizes the programmatic efforts required to implement a detector. To address the problem of ground truth, the traces should ideally span long time frames and be recorded from multiple vantage points. In the future, I plan to write a Bro script to detect this type of TLS tampering, which could be particularly useful to unveil targeted attacks, as these tend to live under the radar of the standard network intrusion detection system.

Sunday, August 16, 2009

Email Attachment Processing in Bro

Malware that spreads via email is nothing new. Particularly targeted attacks against politically sensitive institutions or individuals consist of well socially engineered mails and often ship with custom 0-day malware in the form of email attachments. In order to extract such malicious attachments, I wrote a Bro policy script which records suspicious attachments to disk for later analysis. A possible application scenario would be to scan office documents for malicious JavaScript or executables for viruses. Another option would be hash the attachment directly in Bro and compare it against a publicly available registry, as Seth Hall illustrates for HTTP traffic.

It hooks into the mime_header_handler table and adds a callback for CONTENT-TYPE. If a sensitive attachments is seen, a NOTICE will be generated. Sensitive attachments are either detected by MIME type or by file extension. The user can customize the the analyzer behavior in many ways. For example, to change the directory where the attachments are stored on disk, one can load the analyzer and redefine the attachment_dir variable:

@load email
redef Email::attachment_dir = "foo";

It is also possible to restrict or extend the regular expression used to determine whether an attachment is sensitive or not. Below is the full source of the Bro email module, which currently only consists of the attachment analyzer.

@load mime

#
# An email attachment analyzer.
#

module Email;

export {
    redef enum Notice += {
        SensitiveMIMEType,      # Sensitive MIME type.
        SensitiveExtension,     # Sensitive file extension.
    };

    # Directory in which email attachments are stored.
    const attachment_dir = "mime-attachments" &redef;

    # Whether attachments with sensitive MIME types should be stored.
    const store_sensitive_mime_types = T &redef;

    # Whether attachments with sensitive file extensions should be stored.
    const store_sensitive_extensions = T &redef;

    # Sensitive MIME types that raise a notice.
    const sensitive_mime_types =
        /application\/.*/   #/application\/(x-dosexec|msword|pdf)/
      | /document\/.*/
      | /image\/.*/
      | /video\/.*/
      &redef;

    # The list of sensitve file extensions that raise a notice.
    const sensitive_extensions =
      # Office documents.
        /[pP][dD][fF]$/
      | /[dD][oO][cC][xX]?$/
      | /$[xX][lL][sS]$/
      | /[pP][pP][sStT]$/
      # Executables.
      | /[eE][xX][eE]$/
      | /[cC][oO][mM]$/
      | /[bB][aA][tT]$/
      # Comprehensive list of archive and compression extensions.
      | /\?(\?_|[qQ]\?)$/
      | /[aA]([cC][eE]|[rR][cC]|[lL][zZ]|[rR][jJ])?$/
      | /[bB][zZ]2$/
      | /[cC][pP][iI][oO]$/
      | /[dD]([dD]|[gG][zZ]|[mM][gG])$/
      | /[cC]([aA][bB]|[pP][tT])$/
      | /[fF]$/
      | /[gG][cChH][aA]$/
      | /[gG]?[zZ]$/
      | /[hH]([aA]|[kK][iI])$/
      | /[iI][cC][eE]$/
      | /[jJ]$/
      | /[kK][gG][bB]$/
      | /[lL([bB][rR]$/
      | /[lL][zZ]([mMzZ][aA]|[aA][hH]|[oO]|[xX])?$/
      | /[pP][aA]([rR][tT][iI][mM][gG]|[qQ]([0-9a-zA-Z])*)$/
      | /[pP]([eE][aA]|[iI][mMtT])$/
      | /[qQ][dD][aA]$/
      | /[rR]([aA][rR]|[kK])$/
      | /[sS][fF][aA][rR][kK]$/
      | /[sS]([dD][aA]|[eE][aAnN]|[fF][xX]|[iI][tT][xX]?|[qQ][xX])$/
      | /[sS]?7[zZ]$/
      | /([tT]|[sS][hH])?[aA][rR]$/
      | /[tT][gGlL][zZ]$/
      | /[uU][hH][aA]$/
      | /[wW][iI][mM]$/
      | /[xX][aA][rR]$/
      | /[zZ]([iI][pP]|[oO][oO]|[zZ])$/
      &redef;

    # Email attachment type.
    type attachment: record
    {
        id: count;
        mime_session: count;
        mime_type: string;
        filename: string;
    };
}

# Unique attachment identifier.
global attachment_id: count = 0;

# Global flag indicating if we store the attachment of the current MIME entity.
global store_attachment = F;

# Since attachments are processed sequentially, we only need one attachment.
global a: attachment;

# The filehandle for the current attachment.
global fh: file;

function mime_header_content_type(session: MIME::mime_session_info,
    name: string, arg: string)
{
    local mime_type = sub(arg, /;.*$/, "");
    local filename = sub(arg, /^.*[nN][aA][mM][eE]=/, "");

    if (sensitive_mime_types in mime_type)
    {
        if (store_sensitive_mime_types)
            store_attachment = T;

        NOTICE([$note=SensitiveMIMEType,
                $id=session$connection_id,
                $msg=fmt("sensitive MIME type in email attachment, %s",
                    mime_type),
                $tag=fmt("%d", session$id)
                ]);
    }

    if (/[nN][aA][mM][eE]=/ in arg && sensitive_extensions in filename)
    {
        if (store_sensitive_extensions)
            store_attachment = T;

        NOTICE([$note=SensitiveExtension,
                $id=session$connection_id,
                $msg="sensitive file extension in email attachment",
                $filename=filename,
                $tag=fmt("%d", session$id)
                ]);
    }

    if (store_attachment)
    {
        a$id = ++attachment_id;
        a$mime_session = session$id;
        a$mime_type = mime_type;
        a$filename = filename;

        # Santize the file name and open it for writing.
        local disk_file = gsub(a$filename, /\//, "_");     # Translate / to _.
        disk_file = gsub(disk_file, /\"/, "");  # Strip ".

        if (disk_file == "")
            disk_file = "NONAME";

        disk_file =
            fmt("%s/%d-%d-%s", attachment_dir, a$id, a$mime_session, disk_file);

        fh = open(disk_file);
    }
}
redef MIME::mime_header_handler += {
    ["CONTENT-TYPE"] = mime_header_content_type
};

event mime_end_entity(c: connection)
{
    if (! store_attachment)
        return;

    close(fh);

    store_attachment = F;
    a$filename = "";
}

# Dispatch mime segment data to the corresponding attachment.
event mime_segment_data(c: connection, length: count, data: string)
{
    if (store_attachment)
        write_file(fh, data);
}

event bro_init()
{
    mkdir(attachment_dir);
}

Sunday, March 22, 2009

spiel: Word Juggling Made Easy

Introducing spiel, a lightweight wordlist generator for people who need a simple yet flexible way to create a torrent of words. If you are looking for a straight-forward to use word juggler that can be used from the command line, this is the right tool for you. At the same time, spiel is blatantly dumb. If you feed it a too large input alphabet, the combinatorical exploison makes intractable to use. A more promising direction is the use of context-free grammars for this task.

Examples


Display usage:
% spiel.rb -h
spiel 0.1, available options:
    -l, --list-charsets              list built-in charsets
    -c, --charset cs1,cs2,...        use built-in charset(s)
    -C, --custom-charset STRING      use custom charset
    -n, --no-duplicates              do not allow duplicate word
    -r, --range min[-max]            word length interval
    -s, --size                       print # of charset permutations
    -w, --write file                 write words to file
    -h, --help                       display this help and exit
List built-in character sets
% spiel.rb -l
numeric: 0123456789
full:    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789! @#$%^&*()-=[]\;',./_+{}|:"<>?`~
special: ! @#$%^&*()-=[]\;',./_+{}|:"<>?`~
lower:   abcdefghijklmnopqrstuvwxyz
alpha:   abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
upper:   ABCDEFGHIJKLMNOPQRSTUVWXYZ
alnum:   abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
Generate all 4-letter lower-case alphabetic words:
% spiel.rb -c lower -r 1-4
a
b
c
...
zzzx
zzzy
zzzz
How many 1-8 letter words exist in the alpha-numeric charset?
% spiel.rb -c alnum -s
221919451578090
How many many permutations exist in the alpha-numeric charset for 8-letter words?
% spiel.rb -c alnum -r 8 -sn
136325893334400
Is my wireless network secure?
% spiel.rb -c alnum -r 1-16 \
    | aircrack-ng -w - -b FE:ED:DE:AD:BE:EF trace.cap

Saturday, May 10, 2008

The Doom of Client-Side Wireless Network Security

HD Moore recently announced the integration of the KARMA tools into the metasploit framework. The implications of this fusion are devastating. In his interview with Patrick Gray, HD presents the new powerful capabilities that take client-side wireless exploitation to a new level. Technically, HD rewrote parts of the original KARMA driver, included some patches, and integrated the KARMA user-land daemons into the metasploit framework.

To illustrate the new potent features of metasploit, consider the following scenario. A user opens his laptop on the plane to watch a DVD. If he ever connected to an insecure access point, it will be in his list of list of preferred wireless networks. Since the operating system attempts to connect to all known wireless networks at boot time or when waking up from hibernation, it sends out probes to look for known networks. An attacker, a couple of rows behind, responds to the probes, provides an IP address to victim by DHCP and is now rigged up to launch a multitude of client-side attacks.

Unaware of being owned, the victim's mail client periodically tries to re-send emails laying around in the outbox. The DNS request for the SMTP server is intercepted by the attacker who returns his own address. Further, he mimics the entire SMTP connection handshake when the victim connects. Thus the victim sends his emails directly to the attacker through a fake SMTP channel. This scenario extends of course to any other plain-text protocol (HTTP, FTP, POP3, etc.). Clearly, the dominant position of the attacker yields ample opportunity for more sophisticated client-side wireless attacks, as the next examples by HD show.

Massive cookie stealing.
Traditional cookie stealing presupposes that the victim actively transmits a cookie from a particular web site in order to be captured by the attacker. In contrast, this attack only requires a single HTTP request to originate from the victim to hijack all cookies from the victim's browser. In general, only the requested site is allowed to read that particular cookie. With a malicous server responding to all client request, the attacker can bypass this restriction. When a victim sends a HTTP request, the attacker returns a chosen list of web sites (say the current top 500 sites) and the browser then tries to connect to each site with the corresponding cookie. Because all sites resolve back to the same attacker's hostname, all cookies arrive in the hands of the attacker. Thus, by merely trying to access an arbitrary page in the Internet, the victim exposed all his cookies that correspond to entry in the attacker's list of sites.

Browser credential theft.
The next interesting step from the attacker's perspective is it to hunt for usernames and passwords. To this end, HD wrote a little script storing all form information of the top 500 websites, e.g. forms asking for personal data, SSN, bank account number, MySpace and Facebook logins, and so on. When the victim visits any arbitrary website, the attackers returns a page full of frames that open the pages from the list and contain the saved form snippets. If the victim enabled automatic form fill-out in his browser preferences, the forms are auto-populated with sensitive user data. In addition to the form snippets the attacker delivers a malicious piece of JavaScript that grabs the form contents after filled out by the browser and sends them back to the attacker. Hence a single page visit results in a complete compromise of the victim's login credentials and personal data.

Web-based SMB relay exploitation.
Worse, if the victim happens to use Internet Explorer, a weakness in Microsoft's SMB file sharing authentication protocol can be exploited to own the victim's machine completely. By including a link pointing to a network file share, the victim is forced to authenticate to the attacker's fake SMB server. This exposes the challenge key that can in turn fed back to the client. Essentially, the victim now authenticates against himself. Once connected, the incoming connection is disconnected and the new session serves as a vehicle to execute arbitrary shellcode.

Who knows what HD's new toy features beyond the sketched scenarios? In any case, these attack vectors witness how broken the actual model of wireless security on the client-side is. While the industry tries to fix wireless encryption schemes, the actual targets, the users themselves, are not considered in the equation. These new techniques essentially render networking in any wireless environment tremendously insecure.

Monday, April 2, 2007

Writing a Linux kernel driver for an unknown USB device

This article explains the creation process of a Linux kernel device driver for an undocumented USB device. After having reverse-engineered the USB communication protocol, I present the architecture of the USB device driver. In addition to the kernel driver I introduce a simple user-space tool that can be used to control the device. Although I have to delve into the specifics of a particular device, the process can be applied to other USB devices as well.

Introduction

Recently, I found a fancy device while searching eBay: the DreamCheeky USB missile launcher. The manufacturer neither provides a Linux driver nor does it publish the USB protocol. Only a binary Windows driver is available, turning the missile launcher into complete "black-box" for Linux users. What a challenge! Let's get the damn gadget working under Linux.

To facilitate USB programming, the USB interface is accessible from user-space with libusb, a programming API concealing low-level kernel interaction. The proper way to write a device driver for the missile launcher would hence be to leverage this API and ignore any kernel specifics. Nevertheless, I wanted to get involved with kernel programming and decided thus to write a kernel module despite the increased complexity and higher effort.

The remainder of this article is structured as follows. After pointing to some related work, I give a quick USB overview. Thereafter, I present the reverse-engineering process to gather the unknown USB commands steering the missile launcher. To come up with a full-featured kernel device driver, I describe the kernel module architecture which incorporates the derived control commands. Finally, I demonstrate a simple tool in user-space that makes use of the driver.

Related Work

Apparently I have not been the only one who played with this gadget. However, none of the existing approaches I have encountered pursue the creation of a Linux device driver for the kernel. The Launcher Library provides a user-space library based on libusb. AHmissile is a GTK+ control tool; a ncurses application is available, too. Apple users become happy with the USB missile launcher NZ project. Moreover, the python implementation pymissile supports a missile launcher of a different manufacturer. The author combined the missile launcher with a webcam in order to to create an automated sentry guard reacting on motion. I will return to these funky ideas later.

USB primer

The universal serial bus (USB) connects a host computer with numerous peripheral devices. It was designed to unify a wide range of slow and old buses (parallel, serial, and keyboard connections) into a single bus type. It is topologically not constructed as a bus, but rather as a tree of several point-to-point links. The USB host controller periodically polls each device if it has data to send. With this design, no device can send before it has not been asked to do so, resulting in a plug-and-play-friendly architecture.

Linux supports two main types of drivers: host and device drivers. We ignore the host component and have a deeper look at the USB device. As shown on the right side, a USB device consists of one or more configurations which in turn have one ore more interfaces. These interfaces contain zero or more endpoints which make up the basic form of USB communication. An endpoint is always uni-directional, either from the host to the device (OUT endpoint) or from the device to the host (IN endpoint). There are four types of endpoints and each transmits data in a different way:

  • Control
  • Interrupt
  • Bulk
  • Isochronous

Control endpoints are generally used to control the USB device asynchronously, i.e. sending commands to it or retrieving status information about it. Every device possesses a control "endpoint 0" which is used by the USB core to initialize the device. Interrupt endpoints occur periodically and transfer small fixed-size data portions every time when the USB host asks the device. They are commonly used by mice and keyboards as primary transport method. As bulk and isochronous endpoints are not relevant for our missile launcher, I skip their discussion. An excellent introduction from a programming perspective gives the Linux Device Drivers book. Below is some output from lsusb -v providing detailed information about the missile launcher.


Bus 005 Device 004: ID 1941:8021
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x1941
  idProduct          0x8021
  bcdDevice            1.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           34
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xa0
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Devices
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      0 None
      iInterface              0
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.00
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      52
         Report Descriptors:
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10

The output is structured and indented like a typical USB device. First, vendor and product ID uniquely identify this USB gadget. These IDs are used by the USB core to decide which driver to give a device to. Moreover, hotplug scripts can decide which driver to load when a particular device is plugged in. Next, we can read off the maximum power usage (100 mA) in the configuration section. The subordinate interface contains apparently one interrupt IN endpoint (besides the control endpoint 0) that can be accessed at address 0x81. Because it is an IN endpoint, it returns status information from the device. To handle the incoming data we first need to understand the missile launcher control protocol.

Reverse-engenireering the USB protocol

The first step involves reverse-engineering (or "snooping") the USB communication protocol spoken by the binary Windows driver. One approach would be to consign the device in a VMware and capture the exchanged data on the host system. But since several tools to analyze USB traffic already exist, the easier solution is to rely on one of those. The most popular free application appears to be SnoopyPro. Surprisingly I do not have Windows box at hand, so I had to install the binary driver together with SnoopyPro in a VMware.

In order to capture all relevant USB data and intercept all device control commands, the missile launcher has to perform every possible action while being monitored: moving the two axes alone and together, shooting, and moving to the limiting axes boundaries (which will trigger a notification that the axes cannot be moved further in one direction). While analyzing the SnoopyPro dump, one can easily discover the control commands sent to the missile launcher. As an example, the picture on the right sight shows an 8 byte transfer buffer. When moving the missile launcher to the right, the buffer holds 0x00000008. Moving the launcher up changes the buffer contents to 0x00000001. It is apparently very easy to deduce the control bytes used to control the missile launcher. Unless a "stop" command (0x00000000) is sent to the device, it keeps the state of the last command. This means if the "down" command is issued, the device continues to turn until it receives a new command. If it is not possible to move further, the motor keeps up running and the gears crack with a unbearable painful sound. Upon closer examination, the interrupt IN endpoint buffer varies depending on the current device position. Whensoever an axis reaches its boundary (and creates the maddening sound), the device detects it and changes the interrupt buffer contents accordingly. This means of notification can be leveraged by the kernel developer to implement a boundary checking mechanism sending a stop command as soon as the missile launcher runs against a wall.

Here is an excerpt of the driver source showing the complete list of control commands that can be sent to the device.

#define ML_STOP         0x00
#define ML_UP           0x01
#define ML_DOWN         0x02
#define ML_LEFT         0x04
#define ML_RIGHT        0x08
#define ML_UP_LEFT      (ML_UP | ML_LEFT)
#define ML_DOWN_LEFT    (ML_DOWN | ML_LEFT)
#define ML_UP_RIGHT     (ML_UP | ML_RIGHT)
#define ML_DOWN_RIGHT   (ML_DOWN | ML_RIGHT)
#define ML_FIRE         0x10

The following bytes appear in the buffer of the interrupt IN endpoint (shown as comment) and indicate that a boundary has been reached.

#define ML_MAX_UP       0x80        /* 80 00 00 00 00 00 00 00 */
#define ML_MAX_DOWN     0x40        /* 40 00 00 00 00 00 00 00 */
#define ML_MAX_LEFT     0x04        /* 00 04 00 00 00 00 00 00 */
#define ML_MAX_RIGHT    0x08        /* 00 08 00 00 00 00 00 00 */

With all required control information in place, we now adopt the programmer's perspective and delve into the land of kernel programming.

The device driver

Writing code for the kernel is an art by itself and I will only touch the tip of the iceberg. To get a deeper understanding I recommend the books Linux Device Drivers and Understanding the Linux Kernel.

As for many other disciplines the separation of mechanism and policy is a fundamental paradigm a programmer should follow. The mechanism provides the capabilities whereas the policy expresses rules how to use those capabilities. Different environments generally access the hardware in different ways. It is hence imperative to write policy-neutral code: a driver should make the hardware available without imposing constraints.

A nice feature of Linux is the ability to dynamically link object code to the running kernel. That piece of object code is called a kernel module. Linux distinguishes between three basic device types that a module can implement:

  • Character devices
  • Block devices
  • Network interfaces

A Character (char) device transfers a stream of bytes from and to the user process. The module therefore implements system calls such as open, close, read, write and ioctl. A char device looks like a file, except that file is "seekable" and most devices operate sequentially. Examples for char devices are the text console (/dev/console) and serial ports (/dev/ttyS0). Most simple hardware devices are driven by char drivers. Discussing block devices and network interfaces goes beyond the scope of this article, please refer to the specified literature for details.

Besides this classification, other orthogonal ways exist. As an example, USB devices are implemented as USB modules but can show up as char devices (like our missile launcher), block devices (USB sticks, say), or network interfaces (a USB Ethernet interface). We now look at the rough structure of a USB kernel module and then turn to particularities of the missile launcher.

struct usb_ml {
    /* One structure for each connected device */
};

static struct usb_device_id ml_table [] = {
    { USB_DEVICE(ML_VENDOR_ID, ML_PRODUCT_ID) },
    { }
};

static int ml_open(struct inode *inode, struct file *file)
{
    /* open syscall */
}
static int ml_release(struct inode *inode, struct file *file)
{
    /* close syscall */
}

static ssize_t ml_write(struct file *file, const char __user *user_buf, size_t
        count, loff_t *ppos);
{
    /* write syscall */
}

static struct file_operations ml_fops = {
    .owner =    THIS_MODULE,
    .write =    ml_write,
    .open =     ml_open,
    .release =  ml_release,
};

static int ml_probe(struct usb_interface *interface, const struct usb_device_id
        *id)
{
    /* called when a USB device is connected to the computer. */
}

static void ml_disconnect(struct usb_interface *interface)
{
    /* called when unplugging a USB device. */
}

static struct usb_driver ml_driver = {
    .name = "missile_launcher",
    .id_table = ml_table,
    .probe = ml_probe,
    .disconnect = ml_disconnect,
};

static int __init usb_ml_init(void)
{
    /* called on module loading */
}

static void __exit usb_ml_exit(void)
{
    /* called on module unloading */
}

module_init(usb_ml_init);
module_exit(usb_ml_exit);

MODULE_AUTHOR("Matthias Vallentin");
MODULE_LICENSE("GPL");

Apart from some global variables, helper functions, and interrupt handlers, this is already the entire kernel module! But let's start off step by step. The USB driver is represented by a struct usb_driver containing some function callbacks and variables identifying the USB driver. When the module is loaded via the insmod program, the __init usb_ml_init(void) function is executed which registers the driver with the USB subsystem. When the module is unloaded, __exit usb_ml_exit(void) is called which deregisters the driver from the USB subsystem. The __init and __exit tokens indicate that these functions are only called at initialization and exit time. Having loaded the module, the probe and disconnect function callbacks are set up. In the probe function callback, which is called when the device is being plugged in, the driver initializes any local data structures used to manage the USB device. For example, it allocates memory for the struct usb_ml which contains run-time status information about the connected device. Here is an excerpt from the beginning of the function:

static int ml_probe(struct usb_interface *interface,
                    const struct usb_device_id *id)
{
    struct usb_device *udev = interface_to_usbdev(interface);
    struct usb_ml *dev = NULL;
    struct usb_host_interface *iface_desc;
    struct usb_endpoint_descriptor *endpoint;
    int i, int_end_size;
    int retval = -ENODEV;
 
    if (! udev) {
        DBG_ERR("udev is NULL");
        goto exit;
    }
 
    dev = kzalloc(sizeof(struct usb_ml), GFP_KERNEL);
    if (! dev) {
        DBG_ERR("cannot allocate memory for struct usb_ml");
        retval = -ENOMEM;
        goto exit;
    }
 
    dev->command = ML_STOP;
 
    init_MUTEX(&dev->sem);
    spin_lock_init(&dev->cmd_spinlock);
 
    dev->udev = udev;
    dev->interface = interface;
    iface_desc = interface->cur_altsetting;
 
    /* Set up interrupt endpoint information. */
    for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) {
        endpoint = &iface_desc->endpoint[i].desc;
 
        if (((endpoint->bEndpointAddress & USB_ENDPOINT_DIR_MASK) == USB_DIR_IN)
                && ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) ==
                    USB_ENDPOINT_XFER_INT))
            dev->int_in_endpoint = endpoint;
 
    }
    if (! dev->int_in_endpoint) {
        DBG_ERR("could not find interrupt in endpoint");
        goto error;
    }
 
    [...]
 
    /* We can register the device now, as it is ready. */
    retval = usb_register_dev(interface, &ml_class);

    [...]
}

You might have noted the use of goto statements in this code snippet. While goto statements are generally considered harmful, kernel programmers, however, employ goto statements to bundle error handling at a central place, eliminating complex, highly-indented logic. The probe function allocates memory for the internal device structure (line 482), initializes semaphores and spin-locks (line 491, 492), and sets up endpoint information (line 502). Somewhat later in the function, the device is being registered (line 516). The device is now ready to be accessed from user space via system calls. I will discuss the simple user-space tool accessing the missile launcher shortly. Yet before that, I present the communication primitives used to send data to the device.

The Linux USB implementation uses a USB request block (URB) as "data carrier" to communicate with USB devices. URBs are like data messages that are sent asynchronously from and to endpoints. Remember that the USB standard includes four types of endpoints. Likewise, four different types of URBs exist, namely control, interrupt, bulk, and isochronous URBs. Once an URB has been allocated and initialized by the driver, it is be submitted to the USB core which forwards it to the device. If the URB was successfully delivered to the USB core, a completion handler is executed. Then the USB core returns control to the device driver.

As our missile launcher features two endpoints (endpoint 0 and the interrupt endpoint), we have to deal with both control and interrupt URBs. The reverse-engineered commands are basically packed into an control URB and then sent out to the device. Also, we continuously receive status information from the periodic interrupt URBs. For example, to send simple data to the missile launcher, the function usb_control_msg is used:

    memset(&buf, 0, sizeof(buf));
    buf[0] = cmd;
 
    /* The interrupt-in-endpoint handler also modifies dev->command. */
    spin_lock(&dev->cmd_spinlock);
    dev->command = cmd;
    spin_unlock(&dev->cmd_spinlock);
 
    retval = usb_control_msg(dev->udev,
            usb_sndctrlpipe(dev->udev, 0),
            ML_CTRL_REQUEST,
            ML_CTRL_REQEUST_TYPE,
            ML_CTRL_VALUE,
            ML_CTRL_INDEX,
            &buf,
            sizeof(buf),
            HZ*5);
 
    if (retval < 0) {
        DBG_ERR("usb_control_msg failed (%d)", retval);
        goto unlock_exit;
    }

The command cmd is inserted into the buffer buf containing the data to be sent to the device. If the URB completes successfully, the corresponding handler is executed. It performs nothing fancy, except telling the driver that we launched a (yet uncorrected) command via the write syscall:

static void ml_ctrl_callback(struct urb *urb, struct pt_regs *regs)
{
    struct usb_ml *dev = urb->context;
    dev->correction_required = 0;
}

We do not want the missile launcher hardware to be damaged by neither sending improper commands nor sending any commands when it reached an axis boundary. Ideally, whenever an axis boundary is reached (meaning that the missile launcher cannot turn further in one direction), the device should stop the movement in the particular direction. The completion handler of the interrupt URB turns out to be the right place to implement this idea:

static void ml_int_in_callback(struct urb *urb, struct pt_regs *regs)
{
        [...]
 
        if (dev->int_in_buffer[0] & ML_MAX_UP && dev->command & ML_UP) {
            dev->command &= ~ML_UP;
            dev->correction_required = 1;
        } else if (dev->int_in_buffer[0] & ML_MAX_DOWN &&
                dev->command & ML_DOWN) {
            dev->command &= ~ML_DOWN;
            dev->correction_required = 1;
        }
 
        if (dev->int_in_buffer[1] & ML_MAX_LEFT && dev->command & ML_LEFT) {
            dev->command &= ~ML_LEFT;
            dev->correction_required = 1;
        } else if (dev->int_in_buffer[1] & ML_MAX_RIGHT &&
                dev->command & ML_RIGHT) {
            dev->command &= ~ML_RIGHT;
            dev->correction_required = 1;
        }
 
        [...]

The above code is used to set the correction_required variable which triggers a "correction" control URB: this URB contains simply the last command without the harming bit. Remember that the URB callback functions run in interrupt context and thus should not perform any memory allocations, hold semaphores, or cause anything putting the process to sleep. With this automatic correction mechanism, the missile launcher is shielded from improper use. Again, it does not impose policy constraints, it protects only the device.

We finish now the discussion of the device driver internals. The entire missile launcher driver (ML-driver) can be downloaded in the code section. What remains is controlling the beast from user-space.

Controlling the missile launcher from user-space

For most people, fun starts in here. One doesn't kick the bucket when dereferencing NULL-pointers and the good old libc is available, too. After having loaded the kernel module, the missile launcher is accessible via /dev/ml0. A second missile launcher would show up as /dev/ml1 and so on. Here is a very simple application to control the device:

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
#define DEFAULT_DEVICE      "/dev/ml0"
#define DEFAULT_DURATION    800
 
#define ML_STOP         0x00
#define ML_UP           0x01
#define ML_DOWN         0x02
#define ML_LEFT         0x04
#define ML_RIGHT        0x08
#define ML_FIRE         0x10
 
#define ML_FIRE_DELAY   5000
 
void send_cmd(int fd, int cmd)
{
    int retval = 0;
 
    retval = write(fd, &cmd, 1);
    if (retval < 0)
        fprintf(stderr, "an error occured: %d\n", retval);
}
 
static void usage(char *name)
{
    fprintf(stderr,
            "\nusage: %s [-mslrudfh] [-t msecs]\n\n"
            "  -m      missile launcher [/dev/ml0]\n"
            "  -s      stop\n"
            "  -l      turn left\n"
            "  -r      turn right\n"
            "  -u      turn up\n"
            "  -d      turn down\n"
            "  -f      fire\n"
            "  -t      specify duration in milli seconds\n"
            "  -h      display this help\n\n"
            "notes:\n"
            "* it is possible to combine the directions of the two axes, e.g.\n"
            "  '-lu' send_cmds the missile launcher up and left at the same time.\n"
            "" , name);
    exit(1);
}
 
 
int main(int argc, char *argv[])
{
    char c;
    int fd;
    int cmd = ML_STOP;
    int duration = DEFAULT_DURATION;
    char *dev = DEFAULT_DEVICE;
 
    if (argc < 2)
        usage(argv[0]);
 
    while ((c = getopt(argc, argv, "mslrudfht:")) != -1) {
        switch (c) {
            case 'm': dev = optarg;
                      break;
            case 'l': cmd |= ML_LEFT;
                      break;
            case 'r': cmd |= ML_RIGHT;
                      break;
            case 'u': cmd |= ML_UP;
                      break;
            case 'd': cmd |= ML_DOWN;
                      break;
            case 'f': cmd = ML_FIRE;
                      break;
            case 's': cmd = ML_STOP;
                      break;
            case 't': duration = atoi(optarg);
                      break;
            default: usage(argv[0]);
        }
    }
 
    fd = open(dev, O_RDWR);
    if (fd == -1) {
        perror("open");
        exit(1);
    }
 
    send_cmd(fd, cmd);
 
    if (cmd & ML_FIRE)
        duration = ML_FIRE_DELAY;
    else if (cmd == ML_UP || cmd == ML_DOWN)
        duration /= 2;
    usleep(duration * 1000);
 
    send_cmd(fd, ML_STOP);
 
    close(fd);
 
    return EXIT_SUCCESS;
}

This tool, let's name it ml_control, allows the user to send data to the device via the write syscall. For example, the device moves three seconds up and left with ./ml_control -ul -t 3000, shoots with ./ml_control -f, or stop with ./ml_control -s. Consider the code as proof of concept, of course more sophisticated applications are imaginable.

Just for fun, I mounted an external iSight camera on top of the missile launcher. Like the author of pymissile suggests, creating an automated sentry based on motion detection is a funky next step. Whenever a movement in the current view is detected, the missile launcher should automatically align itself and fire a missile. Due to the lack of time, I could not pursue this project. Maybe someday, in the unlikely event of getting bored, I will return to this idea. Nevertheless, my friend Thorsten Röder quickly hacked together a Qt GUI. It somehow resembles an early version of Quake...

Summary

In this article, I frame the creation of a USB device driver for the Linux kernel. At first, the unknown USB protocol is reverse-engineered by intercepting all USB traffic to and from the device with the Windows driver. Having captured the complete communication primitives, I explain how to build a USB kernel driver. Finally, a proof-of-conecpt user-space tool is presented that lays the foundation stone for further fancy ideas. Future work touches topics like augmenting the missile launcher with a video camera or mounting it on arbitrary devices.

Saturday, January 6, 2007

Examining and dissecting tcpdump/libpcap traces

Almost every network research involves trace-based analysis with a captured stream of packets. This article presents a set of analysis tools which extract detailed information from tcpdump/libpcap traces.

Having a pcap trace in place, it is about to gather futher details from it, e.g the traffic peak rate or which protocol accounts for the largest share. Many tools exist to accomplish this task. Below I will only sketch a few, highlighting their key features and pointing out adequate application scenarios:

tcpdstat

Written by Kenjiro Cho, tcpdstat is a powerful tool that performs an in-depth protocol breakdown by bytes and packets. It further displays average and maximum transfer rates, IP flow information, and packet size distribution. Dave Dittrich applied several tweaks the tool to support a broader range of protocols and services, and to report more details about flow rates.

Here is an example output (of Dave's enhanced version):

DumpFile:  trace.pcap
FileSize: 98876.89MB
Id: 200703011241
StartTime: (anonymized)
EndTime:   (anonymized)
TotalTime: 7216.13 seconds
TotalCapSize: 96826.91MB  CapLen: 1514 bytes
# of packets: 134347439 (96826.91MB)
AvgRate: 113.10Mbps  stddev:47.96M   PeakRate: 260.92Mbps

### IP flow (unique src/dst pair) Information ###
# of flows: 1612801  (avg. 83.30 pkts/flow)
Top 10 big flow size (bytes/total in %):
 33.6%  3.2%  2.2%  1.5%  1.4%  1.0%  1.0%  0.9%  0.8%  0.8%

### IP address Information ###
# of IPv4 addresses: 480065
Top 10 bandwidth usage (bytes/total in %):
 34.4% 34.4%  3.3%  3.3%  3.0%  2.7%  2.3%  1.8%  1.5%  1.5%

### Packet Size Distribution (including MAC headers) ###
< <<<
 [   32-   63]:   20839652
 [   64-  127]:   38798140
 [  128-  255]:    3947049
 [  256-  511]:    3746280
 [  512- 1023]:    5675556
 [ 1024- 2047]:   61340762
>>>>


### Protocol Breakdown ###
< <<<
     protocol           packets                 bytes           bytes/pkt
------------------------------------------------------------------------
[0] total        134347439 (100.00%)     101530372750 (100.00%)    755.73
[1] ip           134347439 (100.00%)     101530372750 (100.00%)    755.73
[2]  tcp         118172509 ( 87.96%)      97361936181 ( 95.89%)    823.90
[3]   ftpdata        18640 (  0.01%)         16529412 (  0.02%)    886.77
[3]   ftp            72372 (  0.05%)          4697330 (  0.00%)     64.91
[3]   ssh         13849679 ( 10.31%)      11113777353 ( 10.95%)    802.46
[3]   telnet          9007 (  0.01%)          1526445 (  0.00%)    169.47
[3]   smtp         2133471 (  1.59%)       1447293494 (  1.43%)    678.38
[3]   name              23 (  0.00%)             1426 (  0.00%)     62.00
[3]   dns            35071 (  0.03%)          7071657 (  0.01%)    201.64
[3]   http(s)     25043480 ( 18.64%)      30677552254 ( 30.22%)   1224.97
[3]   http(c)     16165378 ( 12.03%)       2182851897 (  2.15%)    135.03
[3]   kerb5            370 (  0.00%)            30610 (  0.00%)     82.73
[3]   pop3           82382 (  0.06%)         26718043 (  0.03%)    324.32
[3]   sunrpc            30 (  0.00%)             3002 (  0.00%)    100.07
[3]   ident           5107 (  0.00%)           322074 (  0.00%)     63.07
[3]   nntp            1262 (  0.00%)           292679 (  0.00%)    231.92
[3]   epmap         209144 (  0.16%)         12909976 (  0.01%)     61.73
[3]   netb-se       404237 (  0.30%)         47178014 (  0.05%)    116.71
[3]   imap          125983 (  0.09%)        100889454 (  0.10%)    800.82
[3]   bgp              482 (  0.00%)            43139 (  0.00%)     89.50
[3]   ldap            7131 (  0.01%)          1434769 (  0.00%)    201.20
[3]   https        2941177 (  2.19%)       1802114169 (  1.77%)    612.72
[3]   ms-ds         245214 (  0.18%)         24263111 (  0.02%)     98.95
[3]   rtsp         1023246 (  0.76%)        691696863 (  0.68%)    675.98
[3]   ldaps           2828 (  0.00%)           209272 (  0.00%)     74.00
[3]   socks           7883 (  0.01%)          1340672 (  0.00%)    170.07
[3]   kasaa          13348 (  0.01%)          1124944 (  0.00%)     84.28
[3]   mssql-s       309786 (  0.23%)         20411848 (  0.02%)     65.89
[3]   squid          51381 (  0.04%)         14079861 (  0.01%)    274.03
[3]   ms-gc           1865 (  0.00%)           493682 (  0.00%)    264.71
[3]   ms-gcs          2034 (  0.00%)           481178 (  0.00%)    236.57
[3]   hotline            6 (  0.00%)              682 (  0.00%)    113.67
[3]   realaud        19784 (  0.01%)         13197979 (  0.01%)    667.10
[3]   icecast       390203 (  0.29%)        291651836 (  0.29%)    747.44
[3]   gnu6346         6324 (  0.00%)          1048473 (  0.00%)    165.79
[3]   gnu6348          342 (  0.00%)            26047 (  0.00%)     76.16
[3]   gnu6349           14 (  0.00%)             2767 (  0.00%)    197.64
[3]   gnu6350            4 (  0.00%)              732 (  0.00%)    183.00
[3]   irc6666            7 (  0.00%)              434 (  0.00%)     62.00
[3]   irc6667         1379 (  0.00%)           196155 (  0.00%)    142.24
[3]   irc6668            2 (  0.00%)              124 (  0.00%)     62.00
[3]   irc6669            9 (  0.00%)              666 (  0.00%)     74.00
[3]   napster           21 (  0.00%)             1344 (  0.00%)     64.00
[3]   irc7000            7 (  0.00%)              824 (  0.00%)    117.71
[3]   http-a        129807 (  0.10%)         71136838 (  0.07%)    548.02
[3]   other       54862568 ( 40.84%)      48787331392 ( 48.05%)    889.26
[2]  udp          13069221 (  9.73%)       3895596348 (  3.84%)    298.07
[3]   name              18 (  0.00%)             1989 (  0.00%)    110.50
[3]   dns          1799081 (  1.34%)        264263480 (  0.26%)    146.89
[3]   kerb5            100 (  0.00%)            25812 (  0.00%)    258.12
[3]   sunrpc           581 (  0.00%)            57157 (  0.00%)     98.38
[3]   ntp            50387 (  0.04%)          4534933 (  0.00%)     90.00
[3]   epmap             17 (  0.00%)             1824 (  0.00%)    107.29
[3]   netb-ns       148619 (  0.11%)         14736588 (  0.01%)     99.16
[3]   netb-se         1272 (  0.00%)           328673 (  0.00%)    258.39
[3]   ms-ds              8 (  0.00%)              883 (  0.00%)    110.38
[3]   kazaa             29 (  0.00%)             3546 (  0.00%)    122.28
[3]   mssql-s           44 (  0.00%)             3832 (  0.00%)     87.09
[3]   mcast        7216682 (  5.37%)       1943012688 (  1.91%)    269.24
[3]   realaud       459195 (  0.34%)        273532235 (  0.27%)    595.68
[3]   halflif           81 (  0.00%)             5890 (  0.00%)     72.72
[3]   starcra           45 (  0.00%)             6367 (  0.00%)    141.49
[3]   everque            9 (  0.00%)             1351 (  0.00%)    150.11
[3]   unreal          1066 (  0.00%)            93951 (  0.00%)     88.13
[3]   quake             20 (  0.00%)             1860 (  0.00%)     93.00
[3]   other        3384119 (  2.52%)       1394472416 (  1.37%)    412.06
[2]  icmp          3105709 (  2.31%)        272840221 (  0.27%)     87.85
[2]  frag            30903 (  0.02%)         25672129 (  0.03%)    830.73
>>>>

At first, tcpdstat shows a summary including the trace size, time, and number of packets. Most interesting here is the average and peak traffic rate in Mbps and the standard deviation, measuring how bursty the traffic was. The second block shows the number of IP flows - i.e., the pair of src and dst address - and the 10 biggest flows in percent. Next, the total disctinct IP addresses are displayed along with the top 10 bandwith usage in percent. Unfortunately, only the percent values are shown and not the addresses themselves. The next block displays the packet size distribution including MAC headers. The last big block is the very informative, illustrating a per-packet and per-byte protocol breakdown. The IP protocol is partitioned in TCP, UDP, ICMP, and fragmented packets. TCP and UDP packets are further broken down into their application layer protocols.

Tcpdstat is great to get a bunch of information out of a trace, very easy to use, but it lacks some flexibility. The tool is your perfect choice if the above output is enough for you. Otherwise, you might incorporate further tools into your analysis.

ipsumdump

The ipsumdump utility from Eddie Kohler is the "swiss-army knife" for trace analysis. Itsummarizes TCP/IP dump files into a nice ASCII format that can be intuitively understood and easily parsed.

Ipsumdump can read packets from network interfaces, from tcpdump files, and from existing ipsumdump files. It will transparently uncompress tcpdump or ipsumdump files when necessary. It can randomly sample traffic, filter traffic based on its contents, anonymize IP addresses, and sort packets from multiple dumps by timestamp. Also, it can optionally create a tcpdump file containing actual packet data.
To demonstrate its versatility, I provided the output of ipsumdump-h below:

'Ipsumdump' reads IP packets from tcpdump(1) files, or network interfaces,
and summarizes their contents in an ASCII log.

Usage: ipsumdump [CONTENT OPTIONS] [-i DEVNAMES | FILES] > LOGFILE

Options that determine summary dump contents (can give multiple options):
  -t, --timestamp            Include packet timestamps.
  -T, --first-timestamp      Include flow-begin timestamps.
  -s, --src                  Include IP source addresses.
  -d, --dst                  Include IP destination addresses.
  -S, --sport                Include TCP/UDP source ports.
  -D, --dport                Include TCP/UDP destination ports.
  -l, --length               Include IP lengths.
  -p, --protocol             Include IP protocols.
      --id                   Include IP IDs.
  -g, --fragment             Include IP fragment flags ('F' or '.').
  -G, --fragment-offset      Include IP fragment offsets.
      --ip-sum               Include IP checksums.
      --ip-opt               Include IP options.
  -F, --tcp-flags            Include TCP flags word.
  -Q, --tcp-seq              Include TCP sequence numbers.
  -K, --tcp-ack              Include TCP acknowledgement numbers.
  -W, --tcp-window           Include TCP receive window (unscaled).
  -O, --tcp-opt              Include TCP options.
      --tcp-sack             Include TCP selective acknowledgement options.
      --udp-length           Include UDP lengths.
  -L, --payload-length       Include payload lengths (no IP/UDP/TCP headers).
      --payload              Include packet payloads as quoted strings.
      --payload-md5          Include MD5 checksum of packet payloads.
      --capture-length       Include lengths of captured IP data.
  -c, --packet-count         Include packet counts (usually 1).
      --link                 Include link numbers (NLANR/NetFlow).

Data source options (give exactly one):
  -r, --tcpdump              Read tcpdump(1) FILES (default).
  -i, --interface            Read network devices DEVNAMES until interrupted.
      --ipsumdump            Read existing ipsumdump FILES.
      --format FORMAT        Read ipsumdump FILES with format FORMAT.
      --dag[=ENCAP]          Read DAG-format FILES.
      --nlanr                Read NLANR-format FILES (fr/fr+/tsh).
      --netflow-summary      Read summarized NetFlow FILES.
      --tcpdump-text         Read tcpdump(1) text output FILES.

Other options:
  -o, --output FILE          Write summary dump to FILE (default stdout).
  -b, --binary               Create binary output file.
  -w, --write-tcpdump FILE   Also dump packets to FILE in tcpdump(1) format.
  -f, --filter FILTER        Apply tcpdump(1) filter FILTER to data.
  -A, --anonymize            Anonymize IP addresses (preserves prefix & class).
      --no-promiscuous       Do not put interfaces into promiscuous mode.
      --bad-packets          Print '!bad' messages for bad headers.
      --sample PROB          Sample packets with PROB probability.
      --multipacket          Produce multiple entries for a flow identifier
                             representing multiple packets (NetFlow only).
      --collate              Collate packets from data sources by timestamp.
      --interval TIME        Stop after TIME has elapsed in trace time.
      --limit-packets N      Stop after processing N packets.
      --map-address ADDRS    When done, print to stderr the anonymized IP
                             addresses and/or prefixes corresponding to ADDRS.
      --record-counts TIME   Record packet counts every TIME seconds in output.
      --random-seed SEED     Set random seed to SEED (default is random).
      --no-mmap              Don't memory-map input files.
  -q, --quiet                Do not print progress bar.
      --config               Output Click configuration and exit.
  -V, --verbose              Report errors verbosely.
  -h, --help                 Print this message and exit.
  -v, --version              Print version number and exit.

Report bugs to <kohler@cs.ucla.edu>.

Among the above options, some deserve more attention. For example, the --payload-md5 option includes a MD5 checksum (e.g. i4CxGSojVHB2XcZw97ZpQb) of the packet payload in the dump. This option comes in handy when you want to check for packet duplicates. Another nifty option is --anonymize. Since traces can contain sensitive data, it is possible to anonymize the IP addresses in the output in order to prevent information leakage. The anonymization preserves prefix and class. In high-volume networks, the --sample=p option might be interesting. It samples packets with probability p. That is, p is the chance that a packet will cause output to be generated. The actual probability may differ from the specified probability, due to fixed point arithmetic. If you want to merge several trace files retaining the temporal order, --collate together with --write-tcpdump sorts your packets with increasing timestamp.

Summing up, ipsumdump is a flexible trace analysis tool complementing tcpdump. However, it does not feature predefined evaluation methods like tcpdstat (which is actually not a design goal). Nevertheless, ipsumdump is a valuable tool to quickly generate easily readable ASCII output or to obtain well-formatted output for subsequent processing/scripting. It unveils its real power when used for trace manipulation such as merging, modifying, or anonymizing tcpdump traces.

Netdude

Netdude (Network dump data displayer and editor) is a graphical tool to edit tcpdump trace files, written by Christian Kreibich. In fact, it is a front-end to the libnetdude packet manipulation librarySince complex trace manipulation is non-trivial and often requires custom coding, Netdude provides a GUI enabling users to

  • Edit traces of arbitrary size in a scalable fashion. Netdude never loads more than a configurable maximum number of packets into memory at any time.
  • Edit multiple traces at the same time, making it easy to move packets from one trace to a different one.
  • Modify every field in protocol headers for which a protocol plugin provides support. These modifications can be applied to either only individually selected packets, packets currently in memory, or all packets in the trace, including the ones not currently loaded.
  • Filter packets by using filter plugins. Netdude 0.4.6 ships with a BPF filter plugin that allows you to use the standard BPF filter language to define your filters.
  • Inspect and edit raw packet content using Netdude's payload editor in either hex or ASCII mode whichever is more convenient for the payload you are editing.
  • Move packets around, duplicate them, remove them from traces.
  • See the tcpdump output updating instantly according to the modifications you're making.
  • Conveniently use the clipboard to select lines from the tcpdump output for situations when you need the tcpdump output only (e.g., when writing documentation, papers or emails).

As soon as packet editing is part of the game, Netdude offers an interactive and user-friendly GUI to perform even sophisticated manipulations.

Conclusion

Nowadays, captured network traffic is mostly available as a tcpdump/libpcap trace. In this article, I introduce three tools enabling in-depth trace examination. At first, the tool tcpdstat provides a high-level view of the trace ingredients with a detailed protocol breakdown. Second, ipsumdump offers a flexible means to generate nicely formatted ASCII dump. It can be used to quickly extract a desired piece of information or as a multi-purpose output generator. Finally, Netdude features a comfortable GUI to selectively manipulate packet details. Equipped with this arsenal, trace-based network analysis feels like a hot knife through butter.