Use python to implement wireshark's follow tcp stream function

Source: Internet
Author: User

Use python to implement wireshark's follow tcp stream function

In short, wireshark has a follow tcp stream function, which is very convenient. The disadvantage is that the extracted stream data does not have any timestamp or other information, and it is insufficient to analyze the data delay and packet loss problems. Here, python is used to implement a simple follow tcp stream function while retaining tcp information.

 

The principle is very simple. It is still based on wireshark, which contains an Export packet dissection as XML 'pdml' file. The exported file content looks like this:

 

 

 

I don't need to say anything when I see the above content. Use python to parse an xml file and extract the data.

So the remaining question is how to implement the follow tcp stream algorithm? In essence, it is a process of reorganizing tcp data. For details, refer to the analysis of TCP packet restructuring in this blog.

Here, for simplicity, I have made some constraints:

 

Only data in A single direction such as A --> B can be extracted. To extract data from B --> A, you can filter the data again and execute A script. Ignore the starting syn Packet and the Fin packet when the connection is disconnected. Based on the above two simplification, the actual algorithm can be simplified to sort data from small to large according to the seq in tcp frames. For example, there are three tcp packets, which are sorted by seq as follows:

 

(Seq = 1, nxtseq = 5, data = '000000'), (seq = 4, nxt = 6, data = '45'), (seq = 7, nxt = 8, data = '7 ')

The nxtseq of the first packet> the seq of the second data indicates that there is data duplication between the two data packets. This is also true, and the number '4' is repeated'

Nxtseq of the second data packet <seq of the third data packet, indicating that there is a frame drop between the two data packets. The same is true. The number '6' is lost'

Now, we will introduce the principles here.

 

The remaining section briefly introduces wireshark filtering rules and limitations of this algorithm.

 

Filters data in a certain direction by ip address. Generally, You can first execute the follow tcp stream function of wireshark. Generally, such an expression is displayed in the filter column: tcp. stream eq xxx. After this expression, you can continue to follow the ip Filter expression: tcp. steam eq xxx and ip. src = xxx and ip. dst = xxx filter data in a certain direction based on the tcp port number. First, the same ip address is filtered. First, it is fixed to a tcp connection to obtain tcp. stream eq xxx. Add the port filter: tcp. stream eq xxx and tcp. srcport = xxx and tcp. limitations of the dstport = xxx tool, because it is based on python element tree to parse xml files and extract data, so even if it is to parse a m pcap file, first, the generated pdml file will surge to several hundred megabytes, and then the hundreds of megabytes of files will be read into the memory again (the characteristics of the python element tree ), in total, it is a little slow (several minutes) to generate pdml files. The memory consumption is extremely large and several hundred megabytes.

 

 

Finally, paste some key code. The complete script can be downloaded from here for free TCPParser -- follow tcp stream by python

This is to extract the required element information from a proto In the pdml file.

 

def extract_element(self, proto, elem):        result = dict()        for key in elem.keys():            result[key]=                fieldname   =         attribname  =             for field in proto.findall(field):            fieldname = field.get('name')            if fieldname in elem:                attribname = elem[fieldname]                result[fieldname] = field.get(attribname, '')                        return result


 

 

Def regularize_stream (self, frame_list): ''' regularization of tcp stream data, mainly based on seq, nxtseq to complete the missing segment, and delete a lot of missing segment of repeated data, data is null, frame. number = 'Lost' When deleting duplicate data, try to keep the data received earlier, that is, the data of the previous package '''self. reporter. title (TCPParser regularize timestamp) timer = Timer (regularize_stream_data ). start () reg_frame_list = [] expectseq =-1 first = True for frame in frame_list: if first: # first packet first = False expectseq = frame [tcp. nxtseq] reg_frame_list.append (frame) continue # seq = frame [tcp. seq] nxtseq = frame [tcp. nxtseq] if seq = expectseq: # data exactly, completely continuous, not many if nxtseq = 0: continue # indicates the ack package, meaningless expectseq = nxtseq reg_frame_list.append (frame) elif seq> expectseq: # if the data is missing, the packet is lost. reporter. error (previous tcp segment is lost: + str (frame [TCPFrame. KEY_FRAMENo]) # newpacket = self. new_lost_packet (frame, str (expectseq), str (seq) # values (newpacket) reg_frame_list.append (frame) expectseq = nxtseq elif seq <expectseq: # data overlaps, too much data is uploaded during data re-transmission. reporter. warning (tcp segment retransmission: + str (frame [TCPFrame. KEY_FRAMENo]) if expectseq <nxtseq: # The current data packet must discard part of the content # pre_packet [-(expectseq-seq):-1] = frame [0: expectseq-seq] frame [tcp. seq] = expectseq frame [data] = frame [data] [expectseq-nxtseq:] frame [datalen] = len (frame [data]) expectseq = nxtseq reg_frame_list.append (frame) else: # The content of the current data packet can be completely discarded # expectseq remains unchanged # pre_packet [-(nxtseq-seq):] = frame [: nextseq-seq] pass timer. stop () return reg_frame_list


 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.