Original: HTTP made really easy because I own network foundation is very poor, so see this article on the one hand is learning network knowledge, on the other hand in order to exercise my poor English level, if there are errors in the text, welcome to see correct!
Objective
When looking at this article, it is recommended to use Chrome browser to view the relevant parameters during the HTTP request. Chrome browser, you can enter developer mode via ' Alt+cmd+i '. Go to the ' Network ' column and locate the requested URL in the ' Name ' field. View the Headers column to see ' Response Headers ' and ' Request Headers '. And you can choose ' view parsed ' and ' view Source '. Let's take a look at http://www.xuewiehan.com as an example. Such as:
HTTP is really simple.
HTTP is a network protocol that is simple but powerful. Know the HTTP protocol so that you can write a Web browser, Web server, crawler or other useful tool.
This is an easy-to-read article that explains HTTP. teaches you to write HTTP clients and servers. Reading this article requires you to have socket network programming basis, HTTP for Socket programming programmer is very simple. So make sure you learn socket programming first (I'm going to write a tutorial on socket programming!). ) at the same time understand the CGI.
The first half of this article is based on HTTP 1.0, and the lower half is a new feature that explains HTTP 1.1 . While not covering all the points of HTTP, but will let you have a basic framework for HTTP, then you can according to your needs, then in-depth comprehensive study.
Before the beginning of the article, take a look at the following two paragraphs:
- Writing a Web application requires more care than writing a stand-alone program, considering things more! Of course you have to conform to the standard (i.e. agreement), otherwise no one can understand you. More importantly: the garbage program you write, running on your machine, will only waste your machine's resources (CPU, bandwidth, memory). If a junk network program, you will waste other people's resources. If it is a special garbage program, it will waste thousands of people's resources. These conditions make it useful for everyone to build better, safer network protocols!
- Don't blindly write reptiles unless you know exactly what you are doing. Reptiles are very practical, but the crawler of garbage, do not conform to the rules of the crawler, making the network environment more and more complex. If you want to write any ' bot ', please follow the content in robots.txt.
What is HTTP?
HTTP is a Hypertext Transfer Protocol, a network protocol used in the World Wide Web to transfer files, whether HTML files, or pictures, requests, and so on. Usually the HTTP protocol is based on TCP/IP socket communication.
The HTTP protocol is the protocol for client and server (servers) communication. The browser is a client because it sends a request to the HTTP server (that is, the Web server), and the Web server returns the response to the client. A server that complies with the HTTP protocol, listens to port 80 by default, and can, of course, reassign any port.
What is a resource?
The HTTP protocol is used to propagate resources, not just files. A resource is some information that corresponds to a URL link. The resources we see generally are files, and resources can be: CGI scripts written in different programming languages, dynamically generated, and output. Returns the file for the requested result.
Learning HTTP helps to understand the concept that resources are similar to files. In the actual scenario, the HTTP resource is not a static file, which is the result of a dynamically generated server-side script.
The transport structure of HTTP
Like most network protocols, HTTP is also the C/S mode: The client sends a request connection to the server and the requested information content, and the server returns the response information. Typically contains the requested resource. After the server sends the response, close the connection. (HTTP is a stateless connection)
The format of the request and response is similar, and they all consist of:
- An initial row
- 0 or more header information
- A blank line
- An optional message body
Composed of, the format is as follows:
for request vs. response>Header1: value1Header2: value2Header3: value3<optional message body goes here, like file contents or query data; it can be many lines long, or even binary data $&*%@!^[email protected]>
initial line
And headers
must by the 回车
end.
Initial Request Line
The first line of the request is not the same as the first line of the response! The first line of the request has three parts: The method name, the path to the requested resource (that is, the/delimited path), and the version of the HTTP protocol used. When visiting the homepage of my blog, the request header is as follows:
GET / HTTP/1.1
Note :
- The Get method is the most commonly used method in HTTP, which means: ' I want this resource '. Another common method is to do a detailed explanation after post. The method name is all uppercase.
- The part behind the domain name is the path, and the default is '/'.
- HTTP versions such as: ' http/x.x ', all uppercase.
Initial Response Line (response statement row or status line)
The initial line of the response, called the ' status line '. It is also made up of three parts: HTTP protocol version, status code, status code description. Also in my blog as an example, the status line is as follows:
304 Not Modified
Note :
- HTTP version of the content, the same format as above.
- The status code is a three-bit integer, and the first bit is usually divided into the following categories:
- 1xx This type of status code, on behalf of the request has been accepted, need to continue processing. Message
- 2xx This type of status code, representing the request has been successfully received, understood, and accepted by the server. Success
- 3xx This type of status code represents the need for the client to take further action to complete the request. (redirect)
- 4xx This type of status code indicates that the client may appear to have an error that prevents the server from processing. (Client Error)
- 5xx This type of status code indicates that the server has an error or an abnormal state occurred during the processing of the request. (Server error)
Common Status Codes:
OK: The request was successful and the resource was received.
404 Not Found: The request failed and the resource was not found.
301 Moved Permanently: permanent transfer.
302 Moved temporarily: temporary transfer.
303See other: The requested resource has been moved to a different URL, and the client will automatically jump. This is usually used by CGI scripts, which allows the client to redirect redirect
to another URL.
Server error: An unknown server fault.
Header Lines (Request header)
The request header provides information about the request or response, or information about the body of the message being sent.
The request header is in the form of one row for each header, such as "Header-name:value", ending with a carriage return. This format is also used for mail, and more detailed description:
- The request header is case-sensitive
- There can be any number of spaces after the colon ': '
The following two formats, the effect is the same:
Header1: some-long-value-1a, some-long-value-1bHEADER1: some-long-value-1a, some-long-value-1b
HTTP1.0 defines 16 kinds of headers, which are not mandatory. The HTTP1.1 defines 46 types of headers and must be taken with the request (Host:). When requested, some conventions are commonly known as provisions (non-compliance is not a problem, but it is best to abide by).
- From header: Contains who requested it, or what the operation did.
- User-agent: It contains information about who is requested (user identity), for example:
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36
The above-mentioned headers help network administrators analyze the problem. This information also provides the identity of the user (the information can be forged).
If you are writing a ' servers ', consider returning the response with the following headers added:
- Server: Similar to the user-agent header, which represents the identity of the service.
- Last-modified: Remember when this file was last modified at the end of the service period. Typically used for caching, saving bandwidth. For example:
Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT
Message body (Messaging entity)
An HTTP message that may have a message entity behind the header information. In response, the message entity is either a resource requested by the client or a prompt error message. When requested, the message entity is either the data entered by the user or the uploaded file.
If an HTTP message contains a message entity, then usually the following headers are used to describe the message entity, for example:
- Content-type: The data type used to represent the message entity for example: ' text/html ' or ' image/gif '
- Content-length: Indicates the size of the message body (bytes)
Examples of HTTP interactions
For example, request a file:http://www.somehost.com/path/file.html
The socket connection is first established with the 80 port of the target website: ' www.somehost.com '. This is then sent via the socket connection similar to:
/path/file.html HTTP/1.0From: someuser@jmarshall.comUser-Agent: HTTPTool/1.0[blank line here]
At this point the server returns a response with the following format:
200 OKDate: Fri, 31 Dec 1999 23:59:59 GMTContent-Type: text/htmlContent-Length: 1354<html><body><h1>Happy New Millennium!</h1>(more file contents) . . .</body></html>
When the response is sent, the server closes the socket connection.
HTTP Proxy
An HTTP proxy is an intermediate program for the server and client. It accepts the request from the client and then sends the request back to the server. The same is true when the response is returned, which needs to be done by proxy.
Proxies are typically used for firewalls, LAN security, and so on.
When a client uses a proxy, it sends all requests to the agent, not to the server. There is a difference between a proxy request and a normal request: the first line, the proxy request uses the full URL, not just path. For example:
http://www.somehost.com/path/file.html HTTP/1.0普通请求:GET /path/file.html HTTP/1.0
In this way, the agent knows the requested server address.
' Tolerate others '
As is often said: "Strict sending, tolerant reception." "As you interact with the information, other clients and servers may have flaws in the information they send. However, you should try to anticipate these problems so that everything works properly. Here are some suggestions:
- Even if the rule must end with a carriage return (CRLF), some people may only use a newline (LF), so please accept both.
- Within the sent message, each part must be isolated by a blank line. However, it is possible for other programs to be isolated using several blank lines. Therefore, we must also consider accepting this situation.
Of course, there are some other situations, in short, more compatible.
End
This is the basic knowledge of HTTP. If you want to know more, you need to check the official information.
So far only reasonable HTTP1.0 knowledge, the following will speak HTTP1.1 knowledge, so take a break, let us upgrade a bit!
HTTP1.1
Like a lot of protocols, HTTP is constantly escalating. HTTP1.1 has perfected some shortcomings of HTTP1.0. In general, the areas for improvement include:
- Faster response, allowing multiple HTTP requests and responses on one connection. (called: HTTP persistent connection)
- Increase cache support, save bandwidth, and increase response speed.
- Faster response and generation of pages, because block encoding is supported, allowing the data to be sent to be divided into multiple parts, and the advantage is that the total size of the sent content is not pre-known until the data is sent.
- Because the host header field is added, the Web browser can configure multiple virtual Web sites with one IP address.
HTTP1.1 needs to add something extra to the server and the client. The next two sections say: How to write a client and server that follows the HTTP1.1 protocol. Of course, if you write only the client, you only need to look at the client section. You can choose to read according to your own needs.
HTTP1.1 Clients (client)
In order to comply with HTTP1.1, the client must:
- Each request must contain the host header.
- The allowable response is chunked data (chunked transfer encoding).
- For each request, the header information must be declared to support persistent connections.
- Support Response return status code: ' Continue '.
Host header (Host headers)
HTTP1.1 start, support one IP for multiple virtual hosts. For example: "Www.host1.com" and "www.host2.com" can be the same server (same IP).
A server, there are multiple domain names like: Different people, sharing a mobile phone. The caller knows who they're looking for, but the person answering the phone doesn't know! So the person on the phone needs to make it clear who he's looking for. Similarly, each HTTP request must explicitly indicate the requested host in the host header. For example:
/path/file.html HTTP/1.1Host: www.host1.com:80[blank line here]
:80
it does not need to be specifically noted, as the default is to access port 80.
Requests under the HTTP1.1 protocol must be included in the request header Host
. Without it, each domain name needs a unique IP address, the amount of data on the IP address is drastically reduced, and the website (domain name) is growing at an explosive rate. Host can effectively mitigate the state of the IP address tension.
chunked transfer encoding
' chunked transfer encoding ' is used when the server wants to send a response (such as a particularly long response content, which can take a long time to calculate the amount of data) before it knows the total amount of the response data. It divides the complete response data into many chunks of the same size and then sends. You can also receive such data because the header already contains the ' transfer-encoding:chunked '. All HTTP1.1 clients must be able to receive chunked information.
The chunked message content needs to contain: A row is ' 0 ', which is used to indicate the end of the content. There's ' footers ', and a blank line. Must consist of two parts:
- One line is to represent the size of the block in 16 notation, and the additional parameters behind it are separated by semicolons.
- Data is split with carriage return.
For example:
200 OKDate: Fri, 31 Dec 1999 23:59:59 GMTContent-Type: text/plainTransfer-Encoding: chunkeda; ignore-stuff-hereabcdefghijklmnopqrstuvwxyz10234567890abcdef0some-footer: some-valueanother-footer: another-value[blank line here]
Don't forget, there is a blank line at the end. The size of the text content is 42bytes (1a+10=16+10+16=42), the content is: ' Abcdefghijklmnopqrstuvwxyz1234567890abcdef '.
The chunked data can contain arbitrary binary data. The following content is the same, but there is no response using ' chunked transfer encoding '. As follows:
200 OKDate: Fri, 31 Dec 1999 23:59:59 GMTContent-Type: text/plainContent-Length: 42some-footer: some-valueanother-footer: another-valueabcdefghijklmnopqrstuvwxyz1234567890abcdef
Persistent Connections (persistent connection)
Before HTTP1.0, each request and response is completed and the TCP connection is closed, so each fetch is a separate, independent link. Creating and shutting down TCP connections takes a lot of CPU resources, bandwidth, and memory. In practice, multiple files that compose a Web page are on a single server. Therefore, multiple requests and responses can be transmitted over a persistent connection.
HTTP1.1 is a persistent connection by default, so if no special requirement is used, it is a persistent connection. You only need to establish a connection, that is, you can send multiple requests and read the returned response. If you do this, be sure to note that the read response returns the length to ensure that the correct distinction is made between them.
If a client declares "Connection:close" in the request header, the connection is closed after the response is served. For example, this action scenario: If you know this is the last request for this connection. Similarly, if the response header contains this declaration, the server closes the connection after the response has been sent. Therefore, the client cannot send any requests through this link.
The server may close the connection before sending any one response. Therefore, the client must keep checking the value of the persistent connections header at all times. To ensure that the selected connection is a pathway.
Connections (100 status code)
The client sends a request to the server using the HTTP1.1 protocol, and the server may return a temporary response: ' Coninue '. It indicates that the server received the first part of the request, followed by some slow data transfers. Therefore, in any case the HTTP1.1 client must be able to correctly handle the ' 100 ' status Code response.
The returned ' Continue ' status code is the same as what we said before ' OK ', in a format that matches the normal response. The only difference is the content of the response. As follows:
100 Continue #没有过完整的响应内容。HTTP/1.1 200 OKDate: Fri, 31 Dec 1999 23:59:59 GMTContent-Type: text/plainContent-Length: 42some-footer: some-valueanother-footer: another-valueabcdefghijklmnoprstuvwxyz1234567890abcdef
To address this scenario (the 100 status code has no data), a simple HTTP1.1 client can read the response via the socket, and if the status code is 100, ignore the response and read the next response instead.
HTTP1.1 Server
In order to comply with HTTP1.1, the server must:
- Get the host header from the client's request.
- Accept the request for an absolute URL.
- You can receive chunked transfer encoding.
- Support for ' continuous connection '.
- Proper use of ' Continue '.
- Each response contains the ' Date ' header.
- able to handle ' if-modified-since ' and ' if-unmodified-since ' heads.
- At the very least, you should support the ' GET ' and ' HEAD ' methods.
- Compatible with HTTP1.0 requests.
Host Header Required
Each request must contain the host header, otherwise it will return a response of ' the ' bad Request ' as follows:
400 Bad RequestContent-Type: text/htmlContent-Length: 111<html><body><h2>No Host: header received</h2>HTTP 1.1 requests must include the Host: header.</body></html>
Accept Absolute Address
The host header is actually a transitional approach to resolving the difference between host. After the HTTP version, the request will use the absolute address instead of the path, for example:GET http://www.somehost.com/path/file.html HTTP/1.2
The HTTP1.1 server must accept requests in this format, although the HTTP1.1 client does not send such requests. If the client does not have a host header, the server must also report an error.
chunked transfer encoding
Just as HTTP1.1 customers must accept chunked responses, the server must accept chunked requests. The server does not need to generate, chunked information. As long as you can accept the chunked request.
Persistent connections
If the HTTP1.1 client passes a connection, multiple requests are sent. In order to support persistent connections, the order in which the servers return responses should be the same as the order in which they were requested.
If a request contains a ' connection:close ' header, indicating that this is the last request for this connection, the server needs to close the connection after returning the response. The server also turns off idle connections that are timed out. (usually set 10s timeout)
If you do not want to support persistent connections, the response header contains ' Connection:close '. This means: After returning the current response, the connection is closed. The correct support HTTP1.1 client can correctly accept this header information.
Coninue
As described in the HTTP1.1 Client section, this response is intended to handle slow-reacting connections.
When a HTTP1.1 server receives a HTTP1.1 request, if it is not returned ' Continue ' is the error code. If it sends a ' Continue ' response, the next service will send another response. ' Continue ' does not require a head, but must contain a blank line. As follows:
HTTP/1.1 100 Continue[blank line here][another HTTP response will go here]
Do not send ' Continue ' to HTTP1.0 clients
Date Header
Caching is a significant improvement of the HTTP1.1, without the time stamp of the response. Therefore, each response that the server returns must contain a date header that represents the current time. The format is as follows:Date: Fri, 31 Dec 1999 23:59:59 GMT
In addition to the ' 1XX ' status Code response, all responses must include the date header. Time is the same: Greenwich Mean time.
If-modified-since and If-unmodified-since Head
Avoid sending unnecessary resources, thus saving bandwidth. HTTP1.1 defines the ' if-modified-since ' and ' if-unmodified-since ' request headers. Used to indicate that "only sent after this time has been modified"; The client does not need these, but HTTP1.1 needs this information.
Unfortunately, the earlier HTTP version, the time value format is not uniform, for example:
If-Modified-Since: Fri, 31 Dec 1999 23:59:59 GMTIf-Modified-Since: Friday, 31-Dec-99 23:59:59 GMTIf-Modified-Since: Fri Dec 31 23:59:59 1999
So this time, HTTP is unified using Greenwich Mean Time.
Although the server must accept three time formats, HTTP1.1 clients and servers can only generate a single time format. Without this header, the request will return an unsuccessful status code.
The if-modified-since header is used on the GET request. If the requested resource has been modified during the given time, ignore the header and return the resource normally. Otherwise, the ' 304 not Modified ' response is returned, containing the date header without the message entity. Like what:
HTTP/1.1 304 Not ModifiedDate: Fri, 31 Dec 1999 23:59:59 GMT[blank line here]
The If-unmodified-since head and If-modified-since head are similar, but cannot be used in any way. The requested resource specified cannot process the request until the date time specified within the field value has not been updated. If an update occurs after the specified date time, the status code is 412 Precondition Failed
returned as a response. For example:
HTTP/1.1 412 Precondition Failed[blank line here]
Support for Get and head methods
The HTTP1.1 server must support the get and head methods. If you are using a CGI script, you need to also support the Post method.
The other four methods defined by HTTP1.1 (PUT, DELETE, OPTIONS, and TRACE) are not often used. If the client requests a method that the server does not support, then return ' 501 not implemented ', as follows:
HTTP/1.1 501 Not Implemented[blank line here]
Support for HTTP1.0 requests
In order to be compatible with older browsers, the HTTP1.1 server must support HTTP1.0 requests. When a request is using HTTP1.0:
- No host header required
- Cannot send ' Continue ' response
End
This series of articles are all translated, namely: CGI is really simple and HTTP is really simple. But I think there is some lack of place, should be writing a socket really very simple article, because these are based on the socket communication. My next plan is this:
- First through an actual combat, the two knowledge points, the real mastery.
- Finally, depending on the situation. Supplement complete the entire series of articles. I think there's a lot more to write!
Http://www.cnblogs.com/xueweihan/p/5330189.html
HTTP is really simple (go)