dpkt Tutorial #2: Parsing a PCAP File

Wednesday, October 15, 2008

As we showed in the first dpkt tutorial, dpkt makes it simple to construct packets. dpkt is equally useful for parsing packets and files, so in this second tutorial we will demonstrate parsing a PCAP file and the packets contained within it.

dpkt is a sweet framework for creating and parsing packets. While dpkt doesn't have much documentation, once you get the hang of using one module, the rest fall into place fairly easily. I'll be doing a number of dpkt tutorials with simple tasks in hopes of providing some "documentation by example". If you have any tasks you'd like to see done in dpkt, drop me a line.

In this tutorial, we'll not only show how to parse the raw PCAP file format, but also how to parse the packets contained in the PCAP file. Let's get started!

Let's parse a previously captured PCAP file, test.pcap, that contains some HTTP sessions. If we look at the dpkt/pcap.py module, we see that it offers a Reader class that takes a file object and exposes a similar interface to pypcap for reading records from the file. Let's first open test.pcap with the Reader class:

f = open('test.pcap')
pcap = dpkt.pcap.Reader(f)

We can now interate through the pcap object and access each packet contained in the dump. For example, we can print out the timestamp and packet data length of each record:

>>> for ts, buf in pcap:
>>> print ts, len(buf)
1220901348.61 66
1220901348.68 66
...

Of course, it would be a lot more useful to parse the packet data into a more friendly, usable form. Using dpkt, we can simply pass a raw buffer to the appropriate dpkt class and have its contents automatically parsed and decoded into friendly python objects:

for ts, buf in pcap:
 eth = dpkt.ethernet.Ethernet(buf)

Passing the packet data to dpkt's Ethernet class will parse and decode it into the eth object. Since dpkt's Ethernet class also contains some extra magic to parse higher layer protocols that are recognized, we see that both the IP and TCP layer information has been decoded as well:

>>> print eth
Ethernet(src='\x00\x1a\xa0kUf', dst='\x00\x13I\xae\x84,', data=IP(src='\xc0\xa8\n\n',
off=16384, dst='C\x17\x030', sum=25129, len=52, p=6, id=51105, data=TCP(seq=9632694,
off_x2=128, ack=3382015884, win=54, sum=65372, flags=17, dport=80, sport=56145)))

As we can see from the output, eth is the Ethernet object, pkt.data is the IP object, and pkt.data.data is the TCP object. We can assign references to these objects in a more friendly manner:

ip = eth.data
tcp = ip.data

We can then examine the attributes of the various objects as usual. For example, we can look at the source and destination ports of the TCP header:

>>> print tcp.sport
56145
>>> print tcp.dport
80

Of course, since we know that this packet dump contains HTTP sessions, we may also want to parse beyond the TCP layer and decode the HTTP requests. To do so, we'll ensure that our destination port is 80 (indicating a request as opposed to a response) and that there is data beyond the TCP layer available for parsing. We'll use dpkt's HTTP decoder to parse the data:

if tcp.dport == 80 and len(tcp.data) > 0:
 http = dpkt.http.Request(tcp.data)

Once the HTTP payload has been parsed, we can examine its various attributes:

>>> print http.method
GET
>>> print http.uri
/testurl
>>> print http.version
1.1
>>> print http.headers['user-agent']
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.5)

For the purposes of our tutorial program, we'll just output the http.uri attribute.

And that concludes our tutorial for parsing a PCAP file and the packets within it. In just 10 simple lines of python, we've created a powerful tool that reads the raw PCAP file, parses and decodes the ethernet, IP, TCP, and HTTP layers, and prints out the URI of the HTTP requests.

The full python script for this tutorial follows:

#!/usr/bin/env python

import dpkt

f = open('test.pcap')
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
 eth = dpkt.ethernet.Ethernet(buf)
 ip = eth.data
 tcp = ip.data

 if tcp.dport == 80 and len(tcp.data) > 0:
 http = dpkt.http.Request(tcp.data)
 print http.uri

f.close()