python-the Pcap file to get the specified TCP stream

Source: Internet
Author: User
Tags ack

Through the TCP/IP protocol learning, I wrote a can achieve the Pcap file in the IPV4 TCP stream extraction, as well as extract the specified TCP stream, in order to learn, did not adopt the third party package parsing pcap, but the analysis of the bytes stream, the core idea is: If you want to extract TCP Content, in the lower-level IPV4 protocol to determine whether the protocol is TCP, and then determine whether the type of the underlying Ethernet protocol is the IPV4 protocol (here IPV4 judgment, only for the project I wrote); for the specified stream, the client and the server's [ Ip,port].

I.. pcap file parsing

For a pcap file, its structure is the file header, packet header, packet data, packet header, packet data ..., the file header is 24 bytes, as follows:

    • Magic:4byte: Mark file start and use to identify the file itself and byte order
    • Major:2byte: Main version number of the current file
    • Minor:2byte: The minor version number of the current file
    • Thiszone:4byte: Local Standard Time, if the use of GMT is all zero, usually directly write 0000 0000
    • Sigfigs:4byte: Time stamp accuracy
    • Snaplen:4byte: Maximum storage length
    • Linktype:4byte: Link Type

The data header is 16 bytes, as follows:

    • Timestamp 4Byte: The high position of the captured time with a precision of seconds
    • Timestamp 4Byte: Low capture time, accuracy of microseconds
    • Caplen 4Byte: The length of the current data area, that is, the length of the captured data frame, excluding the length of the packet header itself, in bytes, so that the position of the next data frame can be obtained.
    • Len 4Byte: Offline data length: The length of the actual data frame in the network is generally not much more than Caplen, and in most cases, the Caplen value is equal.
Packet Data
After the packet header, is the data packet data, the data length is Caplen a byte, after this is a new packet Header, the new packet data, so loop. Second, Ethernet Protocol resolution Ethernet protocol for 14byte,6byte Destination,6byte Source,2byte Type

Three, IPV4 protocol resolution different IP protocols are different, this project selects only the TCP stream under IPV4

    • Version 4bit: For IPV 4, this is always equal to 4
    • IHL 4bit: Datagram Protocol header length, which indicates that the protocol header has a number of 32-bit word lengths. The minimum value for this field is 5, which indicates that the length is 5x32 bit = 160 bits = 20 bytes. As a 4-bit field, the maximum value is 15 words (15x32 bit, or 480 bits = 60 bytes)
    • DSCP 6bit: Differential Service code point
    • ECN 2bit: Explicit congestion notification
    • Total Length 2Byte: This 16-bit field defines the entire IP packet size (in bytes), including header and data, with a minimum size of 20 bytes (header without data), and a maximum of 65535 bytes.
    • Identification 2Byte: This field is an identity field that is primarily used to uniquely identify a group of fragments for a single IP datagram.
    • Flags 3bit: For controlling or identifying fragments
    • Fragment offset 13bit: The Fragment offset field is measured in 8-byte blocks. It has a 13-bit length and specifies the offset of a particular fragment from the beginning of the original, non-segmented IP datagram. The first fragment has a zero offset. This allows the maximum offset (2**13-1) x8=65528 bytes, which will exceed the maximum IP packet length of 65535 bytes that contains the header length (65528+20=65548 bytes).
    • Time to Live (TTL) 1Byte: A 8-bit survival period helps prevent datagrams from persisting on the internet
    • Protocol 1Byte: This field defines the protocol used in the data portion of the IP datagram
    • Header Checksum 2byte:16 bit IPV4 header checksum field for error checking on headers
    • Source Address 4Byte: This field is the IPV4 address of the sender of the packet.
    • Destination Address 4Byte: This field is the IPV4 of the packet receiver
    • Options: Option fields are not used frequently.

Iv.. TCP protocol resolution

  • Source Port (BITS): Identifies the send port
  • Destination Port (BITS): Identifies the receive port
  • Sequence number: Serial numbers, which have a dual effect, if the SYN is set to 1, the flag is the initial sequence number, if the SYN is set to 0, this is the initial sequence number, if the SYN is set to 0, Indicates that this is the cumulative sequence number of the first data byte for this segment of the current session
  • Acknowledgment number: If an ACK flag is set, then the value of this field is the next sequential order expected by the ACK sender
  • Data offset (4 bits): Specifies the size of the TCP header in 32-bit units. The minimum header is 5 words and the maximum is 15 words, which makes it a minimum of 20 bytes and a maximum of 60 bytes, allowing the option to set up to 40 bytes in the header
  • Reserved (3 bits): For future use and should be set to zero
  • Flags (9 bits) (aka Control Bits): Contains 9 flag bits
      • NS (1 bit): ecn-nonce-Hidden protection
      • CWR (1 bit): Send host Set congestion window reduction (CWR) flag to indicate that it received a TCP segment with ECC flag set and responded to congestion control mechanism
      • ECE (1 bit): Ecn-echo has a dual role, depending on the value of the SYN flag
      • URG (1 bit): Indicates that the emergency pointer field is valid
      • ACK (1 bit): Indicates that the confirmation field is valid. This flag should be set for all packets after the initial SYN packet sent by the client
      • PSH (1 bit): Push function, request to push buffered data to the receiving application
      • RST (1 bit): Reset connection
      • SYN (1 bit): Sync serial number. This flag should be set only for the first packet sent from each end. Some other flags and fields change meaning according to this flag, some only work when set 1 o'clock, while others are only valid at 0 o'clock
      • FIN (1 bit): Last packet from sender
  • Window size (BITS): The size of the receive window
  • Checksum (BITS): 16-bit checksum field for error checking of headers, payloads, and pseudo-headers
  • Urgent pointer: If the URG flag is set, the offset between this 16-bit field and the sequence number that represents the last critical data byte
  • Options (Variable 0–320 bits, divisible by 32): The length of the field is determined by the data offset field

V. Processing of documents

Some of the core code is as follows:

This section is read into the Pcap (bytes) file, each packet of data as a frame, judged as ipv4-tcp data, will be TCP inside the [src, dst,src_port,dst_port, seq, ACK, flags, content] A frame is extracted, stored in Tcp_stream, where all the TCP streams in the Pcap file are extracted

Here is for the above incoming Tcp_stream, extracts we want to specify the Tcpstream, if the Flags_ack,flages_push is 1 o'clock, that is, the client or server to make an HTTP request, if this packet is confirmed to receive, is stored (avoiding retransmission, packet loss), the Flags_fin is 1 o'clock, the loop is ended, and the specified TCP stream is returned.

Six, complete code

TCP Learning: Https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure

IPV4 Study: Https://en.wikipedia.org/wiki/IPv4#Packet_structure

Complete code: Https://github.com/sunpudding/python, there is not only a complete project code, as well as unit testing, Welcome to download, learn to communicate together.

  

python-the Pcap file to get the specified TCP stream

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.