General method of Http-post request message feature acquisition based on artificial analysis
(Take Baidu paste post behavior analysis of clients as an example)
This article by csdn-蚍蜉 Shake Pine "homepage:HTTP://BLOG.CSDN.NET/HOWEVERPF" original, reprint please indicate the source!
The existing Http-post request message reduction tools are based on features (including behavior recognition features and information extraction features of network applications), and feature acquisition often relies on manual analysis. In the process of analysis, it is often necessary to use a number of tools to help complete the network data acquisition and analysis functions. Depending on the network environment, several of the tools that may be used include:
- Tcpdump (the most commonly used network packet capture tool on the Unix/linux platform)
- Aircrack (most commonly used wireless packet sniffing and decryption tool in WiFi environment)
- Wireshark (the best network packet sniffing and analysis tool on the Windows platform)
The following is an analysis of Baidu paste client post as an example, this is based on manual analysis to obtain HTTP-POST request message characteristics of the general process:
First, the analyst needs to construct a test post and submit it to the server. For the subsequent process analysis is convenient, the general requirements of the test post title and content are as regular as possible, easy to identify. This article constructs a title "new report" in the Linux bar, the content is "fresh man, coming!" of posts, 1.
Figure 1 post on Linux bar using Baidu Stick client
The analyst submits the test data to the server, selects the appropriate packet Capture tool (Wireshark/tcpdump/aircrack) according to the different network environment to collect the packet sent by the client, and saves it in common format such as Pcap file;
The analyst uses the Network Protocol Analysis tool (Wireshark, etc.) to search the captured packets for their own constructed test data. If the search fails, the analyst will need to use some coding tools to perform common encoding of the test data (such as URL encoding, BASE64 encoding, QP encoding, Unicode encoding, MD5 hash, SHA1 hash), and re-search the test data after the various coded variants until the search succeeds.
The analyst uses the Network Protocol Analysis tool (Wireshark, etc.) to reorganize the TCP stream where the packet resides, based on the four-tuple information of the packets found (i.e., source IP address, destination IP address, source TCP port number, destination TCP port number). The test data constructed above is shown in the result of a TCP stream reorganization of 2,
Figure 200 Http-post request message generated by client post
The parser extracts the request line request method "POST", the path portion of the URI "/c/c/thread/add", and the value "c.tieba.baidu.com" of the Host header field, guessing that the three are joined together, should be able to through the three characteristics of the only to determine a Baidu paste client posting behavior. The preliminary conclusion may be that this is the behavior Identification feature to be looked for.
The parser extracts the value of the Content-type header field, confirms that the message entity is in the application/x-www-form-urlencoded format, and finds the URL-encoded post title in the request entity "%e6%96%b0%e4%ba%ba% E6%8a%a5%e9%81%93 "and post content" fresh+ man%2c+coming%21 ". Analysts by observing the post title, post content of the data, based on experience to choose, the first to obtain the title of the message extraction prefix "title=", the content of the message extraction prefix "content=", both of the information extraction suffix is "&". These prefixes may be the information extraction feature you are looking for.
The analyst continues to analyze the header and message entities of the request message, and to find out other potentially useful information based on experience, and to obtain the corresponding extraction characteristics of the information. For the example described earlier, these potentially useful information and their possible information extraction characteristics are shown in table 1.
Table 100 posts possible information and extracting characteristics of client POST request message
The analyst constructs the test data of the other posting behavior, according to the behavior recognition feature and the content extraction feature which is obtained earlier to see whether there is missing. That is, no posting behavior is identified, or no valid information is extracted from all test data. If this occurs, the previously obtained features will need to be modified according to the test data that has been missed until there are no missing detections.
The analyst constructs some non-post behavior test data, according to the previous behavior recognition feature and the content extraction feature to see whether the false detection occurs. That is, the non-posting behavior is identified as posting behavior, or invalid information in the test data is extracted as valid information. If this occurs, the previously derived features will need to be modified according to the test data of the false check until there is no false check.
At this point, a complete manual analysis process is completed.
It should be added that in the example above, the valid information to be extracted is just plain text and has a prefix. But in the actual situation is often more complex, for example:
- Some valid information does not have a unified prefix feature, you can try to calculate the effective information relative to these positions based on some special flags (such as the starting position of the request header, requesting the starting position of the entity, the starting position of the sub-segment data in the Multipart/form-data format). If the effective information in the different test data can be found to be fixed at the offset of a position, then this offset can replace the extraction prefix, as part of the information extraction feature.
- Some valid information does not have a unified suffix feature, and you can try to find out if there is a description of the length of the information in the context of the valid information (for example, chunked encoded parts of the entity); You can also try to calculate whether the valid information in different test data has a fixed length (for example, hash value of password, latitude and longitude information Timestamp), if you can find the valid information about the length of the determination or calculation method, then its length can replace the extraction suffix, as part of the information extraction feature.
- Some valid information in the request entity uses XML, JSON and other complex format, its extraction features can not simply rely on the prefix, offset, information length, etc., but based on the format itself to do the parsing.
- Some valid information is a binary stream that is not recognized by the human eye, rather than a readable plaintext text. However, this situation is more common in HTTP response messages and is less common in Http-post request messages.
by the
This article only for technology sharing, the author has always believed that the technology itself is not good or bad, the key lies in the heart!
:) Readers who wish to see this article can also agree with my view
------This article by csdn-蚍蜉 Shake Pine "Homepage: HTTP://BLOG.CSDN.NET/HOWEVERPF "Original, reproduced please indicate the source!" ------
General method of Http-post request message feature acquisition based on artificial analysis