dpkt Tutorial #4: AS Paths from MRT/BGP

Wednesday, March 25, 2009

Previously we looked at creating ICMP echo requests, parsing a PCAP file, and doing DNS spoofing with the dpkt framework. Today I will show how to parse the AS paths of BGP messages out of MRT routing dumps.

Parsing BGP routing information is fun. However, before projects like RouteViews were around, getting a global view of Internet routing in real-time simply wasn't possible. But thanks to RouteViews, we can extract useful routing information from the MRT dumps from a number of providers' perspectives.

For our example today, we'll be parsing out the AS path from BGP update messages within MRT dumps. Perhaps you're interested in what or how many AS hops are traversed between a source and destination. Maybe you're inferring ISP peering relationships based on the ASNs in the path. Or maybe you're trying to identify anomalous AS paths that may indicate misconfiguration or malicious activity. Whatever the case, we will show how dpkt can make the parsing easy.

The BGP module is by far the most complex and comprehensive of all the dpkt modules weighing in at almost 800 lines. It supports the protocol format of the core BGP RFC as well as 8 RFCs that extend the BGP protocol. On top of dpkt's BGP module, we also have the pybgpdump library that abstracts away a few of the formalities of iterating through a MRT dump.

Before we begin, we need a sample MRT dump. A sample dump that comes distributed with pybgpdump is available here.

pybgpdump defines a class called BGPDump that takes a filename as an argument (as can been seen in pybgpdump.py). The BGPDump can transparently handle gzip'ed, bzip'ed, or uncompressed MRT dumps. An instance of the BGPDump class can be iterated on to step through each MRT record in the dump. If the MRT record is determined to be of the BGP4 type, it will be handed to the user for processing.

For example, a simple snippet to count the number of BGP messages in a MRT dump:

cnt = 0
dump = pybgpdump.BGPDump('sample.dump.gz')
for mrt_h, bgp_h, bgp_m in dump:
    cnt += 1
print cnt, 'BGP messages in the MRT dump'

However, for our tutorial, we would like to print out the AS path for each BGP update message. So instead of incrementing a counter within our for loop, we'll explore our bgp_m object, an instance of the BGP class in dpkt's bgp.py. A BGP instance contains a few miscellaneous attributes (len, type, etc) and will contain a data attribute which is an instance of the Open, Update, Notification, Keepalive, or RouteRefresh class (depending on the message type). Since pybgpdump will only hand us update messages, we know that bgp_m.data is an instance of the Update class.

Update messages in BGP contain NLRI (network layer reachability information), including routes that are withdrawn or announced. In addition, they contain a wide variety of Attributes that specify additional information about the update message. One of these attributes is the AS path, which itself is made up of segments of different types. To actually access the AS path information, we have to reach down deep into the bgp_m object:

bgp_m: instance of BGP
 .type: type of BGP message
 .len: length of BGP message
 .update/.data: instance of Update()
 .withdrawn: list of withdrawn routes
 .announced: list of announced routes
 .attributes: list of instances of Attribute
 .flags: attribute flags
 .type: attribute type
   .data/.as_path: instance of ASPath
 .segments: list of instances of ASPathSegment
 .type: type of path segment (set, sequence, confed)
 .len: number of ASNs in the segment
 .data/.path: list of ASN integers in the segment

While things may look pretty crazy from that object hierarchy, the code to access the AS path is fairly simple:

dump = BGPDump('sample.dump.gz')
for mrt_h, bgp_h, bgp_m in dump:
    for attr in bgp_m.update.attributes:
        if attr.type == bgp.AS_PATH:
            print path_to_str(attr.as_path)
            break

We simply loop through the attributes of the update message until we find the AS path attribute. However, as seen in the object hierarchy, attr.as_path is still a list of segments that needs to be decoded. We use the path_to_str() function to pretty-print this list of segments in the standard AS path form that identifies sets, sequences, and confederations:

DELIMS = ( ('', ''),
           ('{', '}'),  # AS_SET
           ('', ''),    # AS_SEQUENCE
           ('(', ')'),  # AS_CONFED_SEQUENCE
           ('[', ']') ) # AS_CONFED_SET

def path_to_str(path):
    str = ''
    for seg in path.segments:
        str += DELIMS[seg.type][0]
        for AS in seg.path:
            str += '%d ' % (AS)
        str = str[:-1]
        str += DELIMS[seg.type][1] + ' '
    return str

Finally putting it all together, we can successfully extract the AS paths from BGP update messages within a MRT table dump in only a few lines of code:

jonojono@dionysus ~/pybgpdump/samples $ python aspath.py
3333 1103 3549 4755
3333 286 6762 17557
3333 1103 3549 8866 39163
3333 1103 3549 6762 17557
...

The full source for the example in this tutorial is available here:

http://pybgpdump.googlecode.com/svn/trunk/samples/aspath.py

Copyright © 2021 - Jon Oberheide