Simulate communication between browsers and web servers

Source: Internet
Author: User
Tags sleep function

Recently, due to some needs, I have been learning how to program and interact with Web servers.
Now it seems a little getting started. I 'd like to share my experiences with you.
(It is estimated that this is also the basis for farm plug-ins. If you are interested, please discuss it with me)

1. Preparations

Required knowledge/tools:
1. http protocol Basics
2. network packet capture tool

To interact with Web servers, basic HTTP knowledge is required.
Here, I will give you a brief explanation:
At the beginning of the course, you need to know the most important features of the HTTP protocol. Otherwise, you cannot write the program correctly.

1. the HTTP protocol is in the request/response mode.
That is, you first send a request to the Web server, and then the web server will give you a response.
The interaction process with the Web server is the "one answer" process.
There are two common requests: Get (used to request resources ). One is post (used to request data such as submitting forms)

2. the HTTP protocol does not maintain the connection.
What does this mean? Unlike the SMTP protocol of Foxmail,
After each HTTP "Q & A" process ends, the server closes the connection.
That is to say, You soket creates a socket to connect to the server and sends a request (question ),
After the server returns the result (A) of a request, the server closes the socket.
What if I want to send the second request? -- Create a socket and connect to the server.

3. Http does not record the visitor's status
As 2 said, the HTTP protocol does not maintain the connection. After each question and answer, the connected socket is useless.
So how does a website like a forum remember that we have logged on? The answer is: cookies.
After submitting the logon form, if the logon succeeds, the server returns one or more cookies.
In the next request, these cookies are sent to the server again to prove that you have logged on to the website.

The above three points are my summary over the past few days. If there are any mistakes, please give them some advice.
With the basic knowledge of HTTP, we also need a network packet capture tool,
Understand what the browser sends/receives during interaction with the Web server.
The program we write is actually sending the content that the browser sends to the server.
When the browser sends something, we send something to simulate the interaction between the browser and the server.

There are many packet capture tools: Ethereal and sniffer.
Here we only care about HTTP data, so we choose httpwatch.
For more information about how to use httpwatch, please try again... I won't bother here.

2. My httptester Tool

This "tool" is very simple. It is used to send some HTTP request data for testing and receive response data from the server.
If you want to be a forum auto-filling machine or an assistant to XX web games. You can test it with httptester.
The httptester interface is as follows:

Download(40.62 KB)

My httptester Executable File Download:
Httptester.rar(135.99 KB)

Downloads: 252010-9-18

Httptester source code:
IDE: vs2005
Language: C ++
 

The hidden content of this post must be replied before browsing.

The delay in receiving data is how long it takes to receive a response from the server after a request is sent.
Why wait? If you do not wait, after the request is sent (using the send function ),
Receive data immediately (call the Recv function), and the non-blocking socket will not receive any data (the reason is obvious ).
Even if a blocked socket is used, only half of the data is received. Because the server sends a large amount of response data
If network latency exists, the host may first receive some data, while other data will arrive after several seconds.
Blocking the Recv reception of the socket. After receiving a part of the data, it will return.

What is the situation with non-blocking sockets?
You can call the select function to test whether data reaches the latency of several seconds. When the data reaches, call Recv to receive the data.
Note: The select function returns data that can be read within the specified delay period.
That is to say, when only half of the data is returned, the SELECT statement also returns. At this time, only half of the data is received by calling Recv.

In a simple way, both blocking and non-blocking sockets cannot fully receive data.
My approach is: first use the sleep function to delay a few seconds, and then select the function to determine whether data can be read.
If yes, call Recv to receive it. Because the SELECT statement is called in advance,
I think blocking is used here. Non-blocking sockets do not matter.

I still have a flaw in this practice, that is, if the delay is not enough, I still cannot fully receive the server response data.
Looking back at the characteristics of the HTTP protocol, the server will close the socket after the response is complete.
We can try a Recv to receive data loop. The condition for cyclic termination is that the socket has been dropped by the server closesocket.
This attempt will be handed over to you. Please let me know who has done this test and the result...
My simple solution has been able to cope with most situations. Hey.

3. httpwatch + httptester interacts with the server.

3.1 obtain the verification code image on the csdn registered member page

Let me give you an example to illustrate that other products can bypass the class.

Csdn registered member page address is: http://passport.csdn.net/UserLogin.aspx? From = % 2fpassport. aspx

We want to obtain the verification code image on this page:

Download(41.23 KB)

1. Open httpwatch to analyze the request sent by the browser when the verification code is obtained.

Download(107.74 KB)

We can see that the browser sent the following data when requesting this image:

GET/showexpwd. aspx? Temp = ge6nmewq HTTP/1.1
Accept :*/*
Referer: http://passport.csdn.net/UserLogin.aspx? From = % 2fpassport. aspx
Accept-language: ZH-CN
UA-CPU: x86
Accept-encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; sv1;. Net CLR 1.1.4322;. Net CLR 2.0.50727;. Net CLR 3.0.04506.648;. Net CLR 3.5.21022)
HOST: passport.csdn.net
Connection: keep-alive
COOKIE: Asp. net_sessionid = na0bcr0ukuehaamjvzbskgta; clientkey = ab5eb6b5-94b ** 5a5-99fb-1662df04030a

2. Open httptest and copy the data to the "requested data" edit box.
For example:
Note: Fill in "requested data" and there must be two blank lines at the end. Otherwise, no response is received.

Download(60.83 KB)

In the "server address" edit box, enter the IP address or domain name of the Web server, which is passport.csdn.net.
In the "server port" edit box, enter the web host port. The default http port is 80. Here we enter the default port: 80.
The "receive data delay" edit box is set to 1000 (one second)

Then, click "send.
The server response result is as follows:

Download(78.83 KB)

Enter a file name and click "save data" to save the image data returned by the server as a file.
Click "Open File" to view the image of the verification code you received.

3.2.

Through the analysis of httpwatch, we know that when the browser obtains the logo of zhaopeng network
Is to send a request similar to the following:

GET/FORUM/templates/uchome/images/logo.gihttp/1.1
Accept :*/*
Referer: http://www.rupeng.com/forum/
Accept-language: ZH-CN
UA-CPU: x86
Accept-encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; sv1;. Net CLR 1.1.4322;. Net CLR 2.0.50727;. Net CLR 3.0.04506.648;. Net CLR 3.5.21022)
HOST: www.rupeng.com
Connection: keep-alive
COOKIE: cdb_sid = nm5oej

After a preliminary understanding of the HTTP protocol, we know that if you just want to "simply" get the image of rupeng network logo
You only need to send the following request:
(Note: The last two blank lines are required for the sent request)

GET/FORUM/templates/uchome/images/logo.gihttp/1.1
HOST: www.rupeng.com

Displays the status of obtaining OGO, for example, Peng net.

Download(68.96 KB)

3.3 obtain Home Page data of rupeng Network

It's still the same. Use httpwatch for analysis.
The browser sent the following request on the homepage:

GET/FORUM/HTTP/1.1
Accept :*/*
Accept-language: ZH-CN
UA-CPU: x86
Accept-encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; sv1;. Net CLR 1.1.4322;. Net CLR 2.0.50727;. Net CLR 3.0.04506.648;. Net CLR 3.5.21022)
HOST: www.rupeng.com
Connection: keep-alive

My httptester is a little bad and does not have the gzip compression/Decompression function. (Hey, I can't write algorithms myself)
Therefore, to use my httptester to retrieve the homepage data, you need to modify the above request.
The modified request is as follows:

GET/FORUM/HTTP/1.1
Accept-encoding: UTF-8
HOST: www.rupeng.com

Here, the accept-encoding header is changed from the original gzip to UTF-8, which means we can only receive UTF-8 encoded content.

Display the status of obtaining the homepage data of rupeng network:

Download(83.1 KB)

4. simulate a browser to log on to the website

My post "view my Website user information" is an example.
Http://www.rupeng.com/forum/thread-15682-1-1.html

Httptester can be used to explain the steps step by step. But the reason for the length is not mentioned here.
It mainly extracts and fills in cookies. Submit a request to obtain the page information.

5. Chunked Decoding

In the3.3 getting home data of rupeng network"In this step, after obtaining the home page,
View the message header returned by the server, which has the following section: Transfer-encoding: chunked
This header field indicates that the returned additional data (page data) is chunked encoded.

To save the returned additional data, chunked decoding is required.
(Select the "chunked decoding" check box to save the data.
Chunked decoding is relatively simple. I wrote a simple decoding function in httptester,
This decoding process may have some problems, but it is still enough at present)

In fact, for example, you can view the data on the Peng net Homepage without chunked decoding.
I met a website and obtained a verification code image with KB of image data,
It encodes chunked into 4 K. -. -
The image content cannot be viewed without chunked decoding...

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.