In front of the various situations of fsockopen, which involves a lot of other knowledge, such as chunked segmented transmission, keep-alive,http header field and other additional knowledge, if this knowledge smattering, will affect the PHP socket programming knowledge of digestion, So here's a pits, add some basic knowledge.
1. What is keep-alive mode?
We know that the HTTP protocol uses "request-answer" mode, when using normal mode, that is, non-keepalive mode, each request/reply client and server to create a new connection, immediately after the completion of the connection (HTTP protocol is a non-connected protocol) When using Keep-alive mode (also known as persistent connection, connection reuse), the Keep-alive feature keeps the client-to-server connection active, and the keep-alive feature avoids establishing or re-establishing a connection when a subsequent request to the server occurs.
HTTP 1.0 is turned off by default, you need to add "connection:keep-alive" in the HTTP header to enable Keep-alive;http 1.1 by default enabling Keep-alive, if you join "Connection:close", Before closing. Most browsers now use the http1.1 protocol, which means that the Keep-alive connection request is initiated by default, so whether a full keep-alive connection can be completed depends on the server setup.
2. Advantages of enabling Keep-alive
From the above analysis, enabling the keep-alive mode is certainly more efficient and more performance. Because the cost of establishing/releasing the connection is avoided. The following is a summary on RFC 2616:
- By opening and closing fewer TCP connections, CPU time was saved in routers and hosts (clients, servers, proxies, gateways, tunnels, or caches), and memory used for TCP protocol control blocks can is saved in hosts.
- HTTP requests and responses can is pipelined on a connection. Pipelining allows a client to make multiple requests without waiting for each response, allowing a single TCP connection t o be used much more efficiently, with much lower elapsed time.
- Network congestion is reduced by reducing the number of packets caused by TCP opens, and by allowing TCP sufficient time t o Determine the congestion state of the network.
- Latency on subsequent requests is reduced since there are no time spent in TCP ' s connection opening.
- HTTP can evolve more gracefully, since errors can be reported without the penalty of closing the TCP connection. Clients using future versions of HTTP might optimistically try a new feature, but if communicating with an older server, r Etry with the old semantics after a error is reported.
RFC 2616 (P47) also states that the number of connections between a single-user client and any server or agent should not exceed 2. An active concurrent connection of more than 2 * n should be used between an agent and other servers or code. This is to increase the HTTP response time and avoid congestion (redundant connections do not improve the performance of code execution).
3. How to determine the message content/length size?
Keep-alive mode, how the client determines that the response data obtained by the request has been received (or how to know that the server has finished the data)? We already know that the keep-alive mode sends the data HTTP server does not automatically disconnect, all can no longer use the return EOF (-1) to judge (of course you have to use this and no way, you can imagine how low efficiency)! Let me show you two ways to judge.
3.1 Using the message header field Conent-length
As the name implies, Conent-length represents the length of the entity content, and the client (server) can determine whether or not the data is received by this value. But if there is no conent-length in the message, then how to judge it? And under what circumstances will there be no conent-length? Please keep looking down ...
3.2 Using the message header field transfer-encoding
When a client requests a static page or a picture from the server, the server knows exactly what the content is, and then tells the client how much data it needs to receive through the Content-length message header field. However, if it is a dynamic page, and so on, the server is not possible to pre-know the content size, then you can use the Transfer-encoding:chunk mode to transfer data. That is, if you want to generate data on one side and send it to the client, the server needs to use "transfer-encoding:chunked" instead of content-length.
The chunk code divides the data into a piece of the occurrence. The chunked encoding will be concatenated with a number of chunk, ending with a chunk marked with a length of 0. Each chunk is divided into the head and the body two parts, the head content specifies the total number of characters of the body (16 binary numbers) and the number of units (generally do not write), the body part is the actual content of the specified length, separated by a carriage return line (CRLF) between the two parts. In the last chunk of length 0 is the content called footer, which is some additional header information (which can usually be ignored directly).
The format of the chunk encoding is as follows:
05 |
chunk = chunk-size [ chunk-ext ] CRLF |
07 |
hex-no-zero = <HEX excluding "0" > |
08 |
chunk-size = hex-no-zero *HEX |
09 |
chunk-ext = *( ";" chunk-ext-name [ "=" chunk-ext-value ] ) |
10 |
chunk-ext-name = token |
11 |
chunk-ext-val = token | quoted-string |
12 |
chunk-data = chunk-size(OCTET) |
13 |
footer = *entity-header |
The chunk code is made up of four parts:
- 0 to multiple chunk blocks,
- "0" CRLF,
- Footer,
- CRLF.
And each chunk block consists of: Chunk-size, Chunk-ext (optional), CRLF, Chunk-data, CRLF.
4. Summary of message length
In fact, the above 2 methods can be summed up as how to determine the size of the HTTP message, the number of messages. The length of the message is summarized in RFC 2616 as follows: the Transfer-length (transmission length) of a message refers to the length of the Message-body (message body) in the message. When transfer-coding (transfer encoding) is applied, the length of message-body (message body) in each message (TRANSFER-LENGTH) is determined by the following conditions (priority is high to low):
- Any message that does not contain a message body, such as a response message such as 1XXX, 204, 304, and any header (head, header) request, is always terminated by a blank line (CLRF).
- If the Transfer-encoding header field is present and the value is not "identity", then transfer-length is defined by the "chunked" transport encoding unless the message terminates because the connection was closed.
- If the Content-length header field appears, its value represents entity-length (solid length) and transfer-length (transfer length). If the size of the two lengths is different (i.e. transfer-encoding header field is set), the Content-length header field cannot be sent. And if you receive both the Transfer-encoding field and the Content-length header field, you must omit the Content-length field.
- If the message uses the media type "multipart/byteranges" and transfer-length is not otherwise specified, then this custom bound (self-delimiting) media type defines transfer-length. The type cannot be used unless the sender knows that the recipient can resolve the type.
- The connection is closed by the server to determine the message length. (Note: Closing a connection cannot be used to determine the end of a request message because the server can no longer send a response message to the client.) )
In order to be compatible with http/1.0 applications, the http/1.1 request message body must contain a valid Content-length header field unless you know that the server is compatible with http/1.1. A request contains the body of the message, and the Content-length field is not given, if the length of the message cannot be determined, the server should respond with a "bad request", or the server insists on receiving a valid Content-length field with 411 ( Length required) to respond.
All http/1.1 recipient applications must accept the "chunked" transfer-coding (transfer encoding), so this mechanism is allowed to transmit messages when the length of the message cannot be known beforehand. The message should not be sufficient to contain both the Content-length header field and the Non-identity transfer-coding. If a message contains both non-identity transfer-coding and content-length, content-length must be ignored.
5. HTTP Header Field Summary
Finally I summarize the header fields of the HTTP protocol.
- Accept: Tell the Web server what type of media it accepts, */* represents any type, type/* represents all sub-types under that type, Type/sub-type.
- Accept-charset: The browser affirms the character set it receives. Accept-encoding: The browser affirms the encoding method that it receives, usually specifies the compression method, whether compression is supported, and what compression method (Gzip,deflate) is supported. Accept-language: The browser affirms that the language you receive differs from the character set: Chinese is language, Chinese has multiple character sets, such as BIG5,GB2312,GBK and so on.
- The Accept-ranges:web server indicates whether it accepts requests to obtain a portion of its entity, such as a portion of a file. Bytes: Accept, none: Indicates not accepted.
- Age: When the proxy server responds to a request with its own cached entity, it uses that header to indicate how long it has been from the time it was generated to the present.
- Authorization: When the client receives a www-authenticate response from the Web server, it uses that header to respond to its own authentication information to the Web server.
- Cache-control: Request: No-cache (do not cache the entity, request now from the Web server to fetch), Max-age: (Only accept the age value is less than the Max-age value, and there is no expired object), Max-stale: (Can accept the past object, However, the expiration time must be less than the Max-stale value), Min-fresh: (accepts cached objects whose freshness life is greater than the sum of its current age and Min-fresh values), responds: Public (can respond to any user with Cached content), private (only with a slow The content responds to the user who previously requested the content), No-cache (can be cached, but only after the Web server has verified that it is valid to be returned to the client), Max-age: (The expiration time of the object contained in this response), All:no-store (cache not allowed).
- Connection: Request: Close (Tell the Web server or proxy server, after completing the response to this request, disconnect, do not wait for subsequent requests for this connection). KeepAlive (tells the Web server or proxy server, after completing the response of this request, remains connected, waiting for subsequent requests for this connection). Response: Close (the connection is closed). KeepAlive (connection is maintained, waiting for subsequent requests for this connection). Keep-alive: If the browser requests to remain connected, the header indicates how long (in seconds) you want the WEB server to remain connected. Example: keep-alive:300
- The Content-encoding:web server indicates what compression method (Gzip,deflate) It uses to compress the objects in the response. Example: Content-encoding:gzip
- The Content-language:web server tells the browser the language of the object it responds to.
- The Content-length:web server tells the browser the length of the object it responds to. Example: content-length:26012
- The Content-range:web server indicates that the response contains part of the object that is the entire object. Example: Content-range:bytes 21010-47021/47022
- The Content-type:web server tells the browser what type of object it responds to. Example: Content-type:application/xml
- ETag: is an object (such as a URL) of the flag value, in terms of an object, such as an HTML file, if modified, its etag will not be modified, so the role of the etag is similar to the role of last-modified, mainly for the WEB server to determine whether an object has changed. For example, when a previous request for an HTML file, the ETag was obtained, and when the file is requested, the browser will send the previously obtained ETag value to the Web server, and then the Web server will compare the ETag with the file's current etag, and then know that the file has not changed The
- The Expired:web server indicates when the entity will expire and, for expired objects, can be used to respond to customer requests only after it has validated its validity with the Web server. It's http/1.0 's head. Example: Expires:sat, 10:02:12 GMT
- Host: The client specifies the domain/IP address and port number of the Web server that you want to access. Example: Host:rss.sina.com.cn
- If-match: If the ETag of an object does not change, it actually means that the object has not changed before performing the requested action.
- If-none-match: If the ETag of an object changes, it also means that the object has changed to perform the requested action.
- If-modified-since: If the requested object is modified after the specified time in the header, the requested action (such as a return object) is executed, otherwise the code 304 is returned, telling the browser that the object has not been modified. Example: If-modified-since:thu, APR 09:14:42 GMT
- If-unmodified-since: The requested action (such as returning an object) is performed if the requested object has not been modified after the specified time in the header.
- If-range: The browser tells the WEB server that if the object I requested doesn't change, give me the missing part, and if the object changes, give me the whole object. The browser can tell the WEB server whether the object has changed by sending the ETag of the requested object or the last modification time it knows. Always used with the Range header.
- The Last-modified:web server considers the last modification time of the object, such as the last modification time of the file, the last generation time of the dynamic page, and so on. For example: Last-modified:tue, May 02:42:43 GMT
- The Location:web server tells the browser that the object you are trying to access has been moved to a different location to fetch it at the location specified by the header. Example: Location:http://i0.sinaimg.cn/dy/deco/2008/0528/sinahome_0803_ws_005_text_0.gif
- PRAMGA: The main use of pramga:no-cache, equivalent to Cache-control:no-cache. Example: Pragma:no-cache
- Proxy-authenticate: The proxy server responds to the browser and requires it to provide proxy authentication information. Proxy-authorization: The browser responds to the proxy server's authentication request and provides its own identity information.
- Range: A browser (such as Flashget multithreaded download) tells the WEB server what part of the object it wants to take. Example: range:bytes=1173546-
- Referer: The browser indicates to the Web server which page/url obtained/clicked on the URL/url in the current request. Example: referer:http://www.sina.com/
- Server:web server indicates what kind of software and version of the information. Example: server:apache/2.0.61 (Unix)
- User-agent: The browser indicates its identity (which browser). For example: user-agent:mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.8.1.14) gecko/20080404 firefox/2, 0, 0, 14
- The Transfer-encoding:web server indicates how it encodes the response message body (not the object inside the message body), such as whether it is chunked (chunked). Example: transfer-encoding:chunked
- The Vary:web server uses the contents of the header to tell the Cache server under what conditions the object returned by this response responds to subsequent requests. If the source Web server receives the first request message, the header of its response message is: Content-encoding:gzip; Vary:content-encoding then the cache server parses the header of the subsequent request message and checks if its accept-encoding is consistent with the Vary header value of the previous response, that is, whether the same content encoding method is used, which prevents the cache The server responds to the compressed entity in its own Cache to a browser that does not have the ability to decompress. Example: vary:accept-encoding
- Via: Lists which proxy servers The response from the client to OCS or in the opposite direction passed, and what protocol (and version) they were using to send the request. When the client request arrives at the first proxy server, the server adds via header in its own request and fills in its own information, and when the next proxy receives a request from the first proxy server, it copies the Via header of the request from the previous proxy server in its own request. and add their own information to the back, and so on, when OCS receives the last Proxy server request, check Via header, know the route that the request passes. Example: via:1.0 236.d0707195.sina.com.cn:80 (SQUID/2.6.STABLE13)
OK, pits is finished.
Some basic knowledge of PHP socket programming needs to know