Examples of Python parsing for pcap files

Source: Internet
Author: User
Tags unpack python script

Data packets have been being analyzed recently.
And I've always wanted to learn python.

All together ... So, they started. Seat explosion!

Body
The first thing to say is that I know Python has a lot of libraries parsing pcap files, and the reason for not using them here is to understand the format details of the Pcap file. With tcpdump you can easily crawl a series of packets, however tcpdump does not analyze the function of the packet, if you want to analyze some clues from this capture file, such as retransmission situation, you must use software such as Wireshark, Open tcpdump crawled Pcap file with Wireshark, if you see a heap of deep red (similar to the blood in the vein blood color) packets, then these packets must be "in the protocol layer appears" abnormal packets, including but not limited to retransmission, chaos, etc., for details, Please hit "tcp.analysis" in the Wireshark filter. And then it will be automatically complete, all this is simply convenient to the extreme. If you still want to see some global stats, click the first "Capture file properties" in the "Stats" menu and you'll see more information. Although the packet has already passed, but Yanguoliusheng, we can get more information by fetching the packet, thanks to Wireshark/tshark (a character interface pcap file analysis tool, similar to Wireshark, but more suitable for playing mechanical keyboard command line powder used) /shookshark (...) These tools allow us to actually analyze pcap files for information.
However, I don't think that's enough.
There is a simple need, I want to be in a TCP connection, how many bytes of TCP payload data is sent by one end node, including normal sending and retransmission. I didn't find the data in the Wireshark, so I couldn't wait to write one myself. Is the chef afraid of not having meat?
But there is a premise that I have to understand the format of the Pcap file because I want to parse the Pcap file naked, trying to figure out the TCP payload (excluding the TCP header and IP header) length of each packet of interest, and then add it. This way I have to know the format details of the Pcap file.
Fortunately, the Pcap file is very simple, just as I analyzed Windows PE files almost 10 years ago, and I still do the same thing today.
If you are not good at the document, then as a programmer, see Libpcap's source code is also a good choice, almost like any file format, Pcap is also a self-describing format (this self-described design is not elegant enough, so that later the Pcapng file format, I will write a separate article on the following, the whole includes the file header and the data payload, where the so-called data payload is the network packet. In the Libpcap pcap.h file, the struct Pcap_file_header describes the file header:
struct Pcap_file_header {    bpf_u_int32 magic;    U_short version_major;    U_short Version_minor;    Bpf_int32 Thiszone;    /* GMT to local correction */    bpf_u_int32 sigfigs;    /* Accuracy of timestamps */    bpf_u_int32 Snaplen;    /* Max length saved portion of each PKT */    bpf_u_int32 linktype;    /* Data link type (linktype_*) */};
I will not explain the specifics, I'll use an example to parse later. Immediately after this file header, there is a packet, in order to describe the meta-information of each packet, each packet will have a description header:
struct PCAP_PKTHDR {    struct timeval ts;    /* Time stamp */    bpf_u_int32 Caplen;    /* Length of portion present because tcpdump can set the-s parameter to specify the size of the fetch, this field represents the actual fetched packet length */    bpf_u_int32 len;    /* Length This packet the field indicates the natural length of the packet */};

This structure describes the time information and length information of the packet fetch, which will be the packet after this structure, so a typical pcap file should be as follows:




This is absolutely clear, ah, look at my demand again, I would like to count the number of two how to get it?

The number of bytes actually sent by a TCP connection: The sums of the TCP payload length for each packet.
The number of bytes a TCP should theoretically send: the difference between the ending TCP sequence number and the initial sequence number.
With the above discussion, I think this requirement is super simple to realize, in order to demonstrate the study of Python's bleeding effect, give the following code:
#!/usr/bin/pythonimport sysimport Socketimport structfilename = sys.argv[0]filename = Sys.argv[1]ipaddr = sys.argv[2] Direction = sys.argv[3]packed = Socket.inet_aton (ipaddr) IP32 = Struct.unpack ("! L ", packed) [0]file = open (filename," RB ") Pcaphdrlen = 24pkthdrlen=16pkthdrlen1=14iphdrlen=20tcphdrlen=20stdtcp = 20total = 0pos = 0start_seq = 0end_seq = 0cnt = 0# Read 24-bytes pcap headerdata = File.read (Pcaphdrlen) (tag, Maj, Min, TZ One, TS, ppsize, lt) = Struct.unpack ("=l2p2pllll", data) # specific linktype details, see: # http://www.winpcap.org/ntar/draft/ Pcap-dumpfileformat.html#appendixblockcodesif LT = = 0x71:pkthdrlen1 = 16else:pkthdrlen1 = 14ipcmp = 0# Read 16-bytes Packe T headerdata = File.read (Pkthdrlen) while data: (sec, Microsec, iplensave, Origlen) = Struct.unpack ("=llll", data) # Read Lin Klink = File.read (pkthdrlen1) # read IP Headerdata = File.read (Iphdrlen) (VL, TOS, Tot_len, ID, Frag_off, TTL, protocol, CHE CK, saddr, daddr) = Struct.unpack (">sshhhsshll", data) Iphdrlen = Ord (VL) & 0x0F IphdRlen *= 4# Read TCP standard headertcpdata = File.read (stdtcp) (Sport, Dport, seq, Ack_seq, Pad1, Win, check, URGP) = Struc T.unpack (">hhllhhhh", tcpdata) Tcphdrlen = pad1 & 0xf000tcphdrlen = Tcphdrlen >> 12tcphdrlen = tcphdrlen*4if D Irection = = ' out ': ipcmp = saddrelse:ipcmp = Daddrif ipcmp = = ip32:cnt + 1total + = Tot_lentotal = Iphdrlen + tcphdrlenif  Start_seq = = 0: # bug?start_seq = Seqend_seq = seq# Skip Dataskip = File.read (iplensave-pkthdrlen1-iphdrlen-stdtcp) # Read Next Packetpos + = 1data = File.read (pkthdrlen) # Print out the number of bytes actually transferred, and the number of bytes that should have been transferred print POS, CNT, ' Actual: ' +str (total), ' ideal: ' +st R (END_SEQ-START_SEQ)

It's simple! Anyone who knows Python will laugh at me!
In fact, before I look at the pcap file format, I have always thought that pcap files are organized by similar ASN.1, but see later but found not, is quite disappointed. I am disappointed because, it seems that the above description of the pcap can not describe in addition to the packet of more things, it is actually not self-describing, it is a fixed-length format of the file structure, although processing quickly, but very inflexible and difficult to expand! A thorough self-describing structure is asn.1!
......
Let's look at an example. Grab a TCP packet to get Test.pcap file, open this pcap with your UE, please do your own brain repair! If you really understand the pcap of the file organization, then please carefully analyze, if not, please understand thoroughly and not brain repair!


Execute the Python script pcap-parser.py, we have nothing, because this is just a pcap with a pure ACK packet, no data, and the Python script is designed to get the amount of data actually transmitted by the TCP stream, so we have to crawl a pcap file that carries TCP traffic, which is very simple.
Two virtual machines, a, b interconnect, a boot httpd,b execute wget download a file, and set the packet loss rate to obtain additional retransmission data volume. Perform:
pcap-parser.py./testtcp.pcap 192.168.44.129 out
We have the following results:
... Please perform your own access
The results are consistent with the naked eye and I think the script is available. However...
However, when I use the Python script I wrote to analyze a tshark crawl of the packet, found that the parse is wrong, this time the magic word is working, I use the UE brutally opened the Pcap file, the result?

The magic word is all wrong! So on Wireshark website, know this is a pcapng this file format, at the same time, also know pcapng can not backwards compatible. This is a sad thing, but fortunately, the Pcapng file format is much simpler than the pcap, and it is basically a ASN.1-like approach to organization.

We found that in the Pcap file format, most of the meta-description structure is fixed number and length, take linktype as an example, a grab bag I can only specify a linktype, it is recorded in Pcap file Pcap_file_header, which means that I can't grab packets on Ethernet and non-Ethernet PPP cards at the same time and get detailed link layer information at the same time! and pcapng solved the problem.

To know how pcapng, and read the next text.


Appendix Details 1:cooked Capture and Ethernet

If we use the Tcpdump-i any parameter, we will not see the standard Ethernet header information, we see cooked Capture, not ethernet! Crucially, the Cooked capture describes a meta-information length of 16 bytes instead of 14 bytes of Ethernet. The following is an example of cooked capture's head:



This information can be obtained through linktype. Why do you have this cooked capture type of packet? Because the capture tool in the case of-I any, can not be used in a unified way to deal with the length of the link layer, such as many protocols, the interior also distinguishes a lot of sub-protocols, the length of the protocol header according to the application layer, which is the kernel in the packet capture level can not handle. Pcap file, can only be specified in one place linktype, that is the file header after the PCAP_PKTHDR, if I specify-i eth0-i lo-i ppp0-i tun0, it is completely out of the way! Fortunately, if you use the Pcapng format to store the capture files, it can be treated differently for these network cards, each card caught by the packet will be closed to a linktype, you will be more easy to handle the link layer, but most people do not care about the link layer, do not care about IP, more people care about TCP.

Detail 2: Clock hopping as an exercise written in a pcap file, here is an example of the recurrence of a clock transition phenomenon.
We caught a strange phenomenon, that is, from the client grabbed the package, the middle interval of a few 10 seconds did not receive any data, in the service to grab the packet seems to be all right, the total data transmission time is more than 10 seconds, this is how it?
Immediately judge the client grabbed the packet when the clock jumps, such as the clock suddenly after jumping for 40 seconds, in order to reproduce this phenomenon, I take a real ordinary normal TCP download as an example, in the time of receiving and transmitting the No. 900 packet, and later, the packet Description header timestamp field unified plus 40 seconds to see what effect ... The code is simple:
       If pos >:               data = Struct.pack ("=llll", sec+40, Microsec, Iplensave, Origlen)        file_out.write (data)
The phenomenon is then reproduced:



Examples of Python parsing for pcap files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.