Every day a variety of media objects are transmitted via HTTP, such as text, movies, and software programs. HTTP ensures that its messages are correctly transmitted, identified, extracted, and properly processed. To achieve these goals, HTTP uses a sophisticated label to describe the entity that hosts the content. This article will detail the entity and encoding of HTTP
Entity Introduction
If an HTTP message is imagined as a box in an Internet freight system, the HTTP entity is the actual cargo in the message. Shows a simple entity that is installed in an HTTP response message
The entity header indicates that this is a plain text document (Content-type:text/plain), which is only 18 bytes long (content-length:18). As always, a blank line (CRLF) separates the header field from the beginning of the body
The HTTP entity header describes the contents of the HTTP message. The following 10 basic font header fields are defined in version http/1.1
The type of the host object in the Content-type entity content-length the length or size of the transferred entity body Content-language Any transformation (for example, compression) of the language Content-encoding object data that best matches the transferred object Content-location an alternate location, The object can be obtained by the request Content-range If this is a partial entity, this header indicates which part of the whole is CONTENT-MD5 the checksum of the entity body content last-modified Date time that the transferred content was created on the server or last modified expires Entity data is to be invalidated for the various request methods allowed by the resource, such as Get and Headetag The unique verification code for a specific instance of this document Cache-control indicates how the document should be cached
Note The ETag and Cache-control headers are not formally defined as entity headers, but it is important for many operations involving entities
"Entity Body"
The entity body is the original goods. Any other descriptive information is included in the header. Because the goods (i.e. entity bodies) are only raw data, the entity header is required to describe the meaning of the data. For example, the Content-type entity header tells us how to interpret the data (image or text, etc.), while the Content-encoding entity header tells us whether the data has been compressed or re-encoded
The header field ends with a blank CRLF line, followed by the original content of the entity body. No matter what the content is, text or binary, document or image, compressed or uncompressed, English, French or Japanese, followed by this CRLF
Shows an example of two actual HTTP messages. One is carrying a text entity, and the other is the image entity. The hexadecimal value shows the actual contents of the message.
In Figure A, the entity body starts with the 65th Byte, followed by the CRLF at the end of the header. The entity body contains the hi! I ' m a message! " ASCII encoded characters of this sentence
In Figure B, the entity body starts with the 67th byte. The entity body contains the binary contents of a GIF format image. GIF files start with a 6-byte version flag, followed by a 16-bit width and a 16-bit height, and can be seen directly in the entity body of these 3 items
Entity size
The Content-length header indicates the byte size of the entity body in the message. This size is inclusive of all content encodings. For example, if the text file is gzip compressed, the Content-length header is the size of the compression, not the original size
Unless a block code is used, the Content-length header is a message that must be used with the entity body. The content-length header is used to detect packet truncation caused by a server crash and to correctly segment multiple messages that share a persistent connection
Earlier versions of HTTP used a closed connection method to delimit the end of a message. However, without content-length, the client cannot differentiate between whether a normal connection is closed at the end of the message, or whether the connection is closed due to a server crash in the message transmission. Client needs to detect message truncation via Content-length
The problem of message truncation is particularly serious for cache proxy servers. If the cache server receives a truncated message and does not recognize the truncation, it may store incomplete content and use it multiple times to provide the service. Cache proxy servers typically do not cache the HTTP principals that do not have an explicit content-length header to reduce the risk of cache truncated messages
The wrong content-length is worse than the lack of content-length. Because some of the earlier clients and servers have some well-known errors in Content-length computing, some clients, servers, and proxies contain special algorithms to detect and correct interactions with defective servers. http/1.1 Specifies that the user agent agent should notify the user when it receives and detects an invalid length
The Content-length header is essential for persistent connections. If the response is routed through a persistent connection, there may be another HTTP response immediately following it. The client can know where the message ends and where the next message begins with the content-length header. Because the connection is persistent, the client cannot rely on the connection shutdown to determine the end of the message. If there is no content-length header, the HTTP application does not know where an entity principal ends and where the next message begins
There is a case where a persistent connection can be used without the content-length header, when chunked encoding (chunked encoding) is used. In the case of block coding, the data is divided into a series of blocks to send, each block has a size description. Even if the server does not know the size of the entire entity when it generates the header (usually because the entity is dynamically generated), you can still transfer several known-sized blocks using chunked encoding
HTTP allows you to encode the contents of an entity body, such as making it more secure or compressed to save space. If the subject is content encoded, the Content-length header describes the length of the bytes of the body after encoding (encoded), not the length of the original body that is not encoded
Some HTTP applications are mistaken in this regard, sending the size of the data before encoding, which can lead to serious errors, especially on persistent connections. Unfortunately, there is no header in the http/1.1 specification to illustrate the length of the original, non-encoded body, which makes it difficult for the client to verify the integrity of the decoding process
"Determine the rule"
The rules listed below explain how to correctly calculate the length and end position of a body in several different situations. These rules should be applied in order, and whoever matches first
1, if the specific HTTP message type is not allowed with the main body, it ignores the content-length header, it is not actually sent out of the body to calculate. In this case, the Content-length header is suggestive and does not indicate the actual body length
The most important example is the head response. The Head method requests the server to send the header that appears in the equivalent GET request, but does not include the principal. Because the response to get has a content-length header, there is also a head response, but unlike the get response, there is no body in the head response. 1XX, 204, and 304 responses can also have a content-length header, but neither has an entity body. Those messages that do not have an entity principal, regardless of the header field, must terminate at the first empty line after the header
2. If the message contains a transfer-encoding header that describes the transmission code (not using the default HTTP "Identity" encoding), the entity should end with a special mode called a "0-byte block" (Zero-byte Chunk), unless the message has ended because the connection is closed
3, if the message contains Content-length header (and the message type allows the entity body), and there is no non-identical transfer-encoding header field, then the value of Content-length is the length of the body. If the received message has both a Content-length header field and a non-identical transfer-encoding header field, the Content-length must be ignored because the transfer encoding alters the representation and transmission of the entity body ( So it may change the number of bytes transferred)
4, if the message uses the Multipart/byteranges (multipart/byte range) media type, and does not use the Content-length header to indicate the length of the entity body, then each part of the multipart message should explain its own size. This multipart type is the only type of entity principal that is self-bounding, so the media type cannot be sent unless the sender knows that the receiver can parse it
5. If the above rules do not match, the entity ends when the connection is closed. In fact, only the server can use the connection shutdown to indicate the end of the message. The client cannot use the close connection to indicate the end of the client message, because this will cause the server to fail to send back the response
For compatibility with applications that use http/1.0, any http/1.1 request with entity principals must have the correct Content-length header field (unless the server is already known to be compatible with http/1.1)
The http/1.1 specification recommends that for a request with a subject but no content-length header, the server should send a response to the 411 or the length required response if the message cannot be determined. The latter scenario indicates that the server requires the correct content-length header to be received
Entity Summary
Although HTTP is usually implemented on the basis of a reliable transport protocol such as TCP/IP, there are still a number of factors that can cause a part of the message to be modified during transmission, such as an incompatible transcoding agent, an intermediary agent error, and so on. In order to detect whether the data of the entity principal is inadvertently modified, the sender can generate a checksum of the data when generating the initial principal, so that the receiver can capture all unexpected entity modifications by examining the checksum.
The server uses the CONTENT-MD5 header to send the result of running the MD5 algorithm on the entity body. Only the original server that generated the response can calculate and send the CONTENT-MD5 header. Intermediate proxies and caches should not modify or add this header, otherwise they will conflict with this ultimate purpose of verifying end-to-end integrity. The CONTENT-MD5 header is calculated after the content has been encoded in all the required content and has not yet been done with any transmission encoding. In order to verify the integrity of the message, the client must first decode the transmission encoding, and then compute the resulting MD5 of the entity body without transmitting the encoding
If a document is compressed using the GZIP algorithm and then sent using chunked encoding, then the entire gzip-compressed body is MD5 calculated
In addition to the integrity of the inspection messages, MD5 can also be used as a hash-list keyword for quickly locating documents and eliminating unnecessary duplication of content storage. In addition to these possible usages, it is generally not commonly used to CONTENT-MD5 header
As an extension to HTTP, other summary algorithms are presented in the IETF draft. These extensions suggest adding a new Want-digest header, which allows the client to describe the type of digest used in the expected response, and uses quality values to suggest a variety of digest algorithms and describe the order of precedence
Media type
The Content-type header field describes the MIME type of the entity body. MIME types are standardized names that describe the basic media types (such as HTML files, Microsoft Word documents, or MPEG videos) that are used as the cargo carrier entities. Client applications use MIME types to interpret and manipulate their content
The value of Content-type is a normalized MIME type that is registered in the Internet Number Distribution Authority (Internet Assigned Numbers Authority, IANA). A MIME type consists of a main media type (such as text, image, or audio) followed by a slash and a subtype, which is used to further describe the media type
[note] To access the full MIME media type registration list, please go to this
Some of the MIME types commonly used in Content-type headers are listed in the following table
Media type Description text/html entity body is an HTML document Text/plain entity body is a plain text document Image/gif entity body is an image in GIF format image/jpeg The entity body is a JPEG-formatted image audio/x-wav entity Body contains WAV-formatted sound data MODEL/VRML entity body is a three-dimensional VRML model Applicaiion/vnd.ms-powerpoint The entity body is a Microsoft PowerPoint presentation document Multipart/byteranges entity body has several sections, each containing a different byte range in the complete document message/http the entity Body contains Full HTTP message (see TRACE)
It is important to note that the Content-type header describes the media type of the original entity body. If the entity is content encoded, the Content-type header is still the type of the entity body before the encoding
The Content-type header also supports optional parameters to further describe the type of content. The charset (character set) parameter is an example of how to convert a bit in an entity to a character in a text file:
content-type:text/html; Charset=iso-8859-4
The multipart (multipart) e-mail message in MIME contains multiple messages, which together are sent as a single, complex message. Each part is independent and has its own set of content that describes it, and the different parts are concatenated with a delimited string
HTTP also supports multi-part principals. However, it is usually used only in one of the following two situations: To submit a completed form, or as a range response that hosts several document fragments
"Multi-part form submission"
When you submit a completed HTTP table, the variable length text fields and uploaded objects are sent as separate parts of the multi-part body, so that you can fill in the various types and lengths of values in the table. For example, you might choose to use a nickname and a small photo to fill out a form that asks for your name and introductory information, and your friend may have filled out her full name and complained about a bunch of public car repair problems in the introductory information sheet.
HTTP uses headers such as Content-type:multipart/form-data or content-type:multipart/mixed and multi-part principals to send such requests, for example:
CONTENT-TYPE:MULTIPART/FORM-DATA;BOUNDARY=[ABCDEFGHIJKLMNOPQRSTUVWXYZ]
The boundary parameter describes the string used to split the different parts of the body
The following example shows the Multipart/form-data encoding. Let's say we have a table like this:
<form action= "Http://server.com/cgi/handle" enctype= "Multipart/form-data" method= "POST" ><p>what is your Name?<input type= "text" name= "Submit-name" ><br> what files is you sending?<input type= "file" Name= " Files "></p><input type=" Submit "value=" Send "><input type=" reset "></form>
If the user types Sally in the text entry field and selects the text file Essayfile.txt, the user agent agent may send back data such as the following:
If the user also selected another (image) file imagefile.gif, the user agent Agent might construct this section as follows:
"Multi-part range response"
The HTTP response to a range request can also be multi-part. Such responses have content-type:multipart/byteranges headers and multi-part bodies with different scopes. Here is an example of a response to a request from a different range of documents:
Content encoding
HTTP applications sometimes need to encode content before they are sent. For example, a server might compress a large HTML document before it is sent to a client over a slow connection, helping to reduce the time it takes to transfer the entity. The server can also scramble or encrypt the content to prevent unauthorized third parties from seeing the contents of the document
This type of encoding is applied to the content on the sender side. When content is encoded, the coded data is placed in the entity body and sent to the receiver as usual
"Content encoding Process"
The process of content encoding is described below
1, the website server generates the original response message, which has the original Content-type and Content-length header
2. The Content encoding server (which may also be the original server or the downstream agent) creates the encoded message. The encoded message has the same content-type but the content-length may be different (for example, the subject is compressed). The content encoding server adds the content-encoding header to the encoded message so that the receiving application can decode the
3. The receiving program obtains the encoded message, decodes it, and obtains the original message.
Gives an example of content coding
In this example, after the HTML page is processed by the gzip content-encoding function, a smaller, compressed body is obtained. After the network is sent is the compression of the body, and the gzip compression marked. The received client uses the GZIP decoder to decompress the entity
The response fragment given below is another example of a coded response (a compressed image):
http/1.1 Okdate:fri, 22:35:15 gmtserver:apache/1.2.4content-length:6096content-type:image/gifcontent- Encoding:gzip[...]
Note that the Content-type header can and should also appear in the message. It describes the original format of the entity, and once the entity is decoded, it may be necessary to display the information. Remember, the Content-length header now represents the length of the body after the encoding.
"Content encoding Type"
HTTP defines a number of standard content encoding types, and allows additional encodings to be added in the form of extended encodings. The code is standardized by the Internet Number Distribution Agency (IANA), which assigns a unique code name to each content encoding algorithm. The content-encoding header uses these standardized codes to illustrate the algorithm used in the encoding.
The following table lists some of the commonly used content encoding codes
Content-encoding value Description gzip indicates that the entity uses the GNU ZIP code compress to indicate that the entity is using UNIX's file compression program deflate An identity that indicates that the entity is compressed in zlib format indicates that the entity is not encoded. When there is no content-encoding header, it is implied that the situation
gzip, compress, and deflate codes are lossless compression algorithms that reduce the size of transmitted messages without causing loss of information. Of these algorithms, Gzip is usually the most efficient and the most widely used
"Accept-encoding Header"
There is no doubt that we do not want the server to encode the content in a way that the client cannot decode. To prevent the server from using encoding that is not supported by the client, the client places a list of its supported content encodings in the requested accept-encoding header. If the HTTP request does not contain the accept-encoding header, the server can assume that the client can accept any encoding (equivalent to sending accept-encoding:*)
displaying the accept-encoding header in an HTTP transaction
The Accept-encoding field contains a comma-delimited list of supported encodings, and here are some examples
Accept-encoding:compress, gzipaccept-encoding: *accept-encoding:compress;q=0.5, gzip; q=1.0 accept-encoding:gzip;q=l.0, identity; q=0.5, *;q=0
The client can give each encoding a Q (quality parameter to indicate the priority of the encoding). The Q value ranges from 0.0 to 1.0, and 0.0 indicates that the client does not want to accept the encoded code, and 1.0 indicates the encoding that is most desirable to use. "*" means "any other method". Deciding what to echo back to the client is a more general process, and choosing what content encoding to use is part of the process
The identity code designator can only appear in the accept-encoding header, which the client uses to describe the priority relative to other content encoding algorithms
Transfer encoding
Content coding is a reversible transformation of the main body of a message. Content encoding is closely related to the specific format details of the content. For example, you might use Gzip to compress text files, but not JPEG files, because JPEG is not good enough to compress with gzip
Transfer encodings are also reversible transformations that act on entity bodies, but they are used for architectural reasons, regardless of the format of the content. Transfer encoding is used to change the way data in a message travels over the network
"Reliable Transmission"
In other protocols, transmission codes have long been used to ensure that messages are "reliably transmitted" over the network. In the HTTP protocol, the focus of reliable transmission is different because the underlying transport infrastructure is standardized and fault-tolerant is better. In HTTP, in only a few cases, the transmitted message body may cause problems, two of which are described below
1, the unknown size
Some gateway applications and content encoders will not be able to determine the final size of the message body if the content is not Mr. Typically, these servers want to start transmitting data before they know the size. Because the HTTP protocol requires the CONTENT-LENGTH header to be preceded by data, some servers use transport encoding to send the data, and the end of the data is indicated by a special end footnote
2. Security
You can use transmission encoding to disrupt the contents of a message and then send it on a shared transport network. However, because of the prevalence of transport Layer security systems like SSL, it is seldom necessary to rely on transmission encoding for security.
"Transfer-encoding Header"
The HTTP protocol only defines the following two headers to describe and control the transfer encoding
Transfer-encoding tells the receiver what encoding it has been given in order to reliably transmit the message Te used in the request header to tell the server which transport encoding to use to extend
In the following example, the request uses the TE header to tell the server that it can accept chunked encoding (which is necessary if it is a http/1.1 application) and is willing to accept a trailer attached to the end of a chunked encoded message:
Get/new_products-html http/1.1host:www.joes-hardware.comuser-agent:mozilla/4.61 [en] (WinNT; I) Te:trailers, chunked
The response to it contains the transfer-encoding header, which tells the receiver that the message has been encoded using chunked encoding:
http/1.1 oktransfer-encoding:chunkedserver:apache/3.0
After this initial header, the structure of the message will change.
The value of the transfer encoding is case-independent. The http/1.1 specifies that the transfer encoding values be used in the TE header and transfer-encoding header. The latest HTTP specification defines only one transfer encoding, which is the chunked encoding
Similar to the accept-encoding header, the TE header can also use the Q value to indicate the priority of the transmission encoding. However, the Q value associated with the block encoding is not set to 0.0 in the http/1.1 specification
Future extensions of HTTP may drive the need for more transmission encodings. If that's the case, then the chunk coding should still be on top of the other transmission encodings, so that the data can "penetrate" the http/1.1 applications that only understand the chunked encoding but do not understand the other transfer encodings.
"Chunked Encoding"
The chunked code divides the message into several known chunks of size. The blocks are sent next to each other, so that you don't need to know the size of the entire message before sending it.
It is important to note that chunked encoding is a transmission encoding and therefore a property of the message, not the principal
1, chunked and persistent connection
If the client and server are not persistent, the client does not need to know the length of the principal it is reading, but only to read until the server shuts down the principal connection.
When using a persistent connection, you must know its size and send it in the Content-length header before the server writes the principal. If the server dynamically creates content, it may not be able to know the length of the body before sending
chunked Coding provides a solution to this difficulty, as long as the server is allowed to send the principal block by chunk, indicating the size of each block. Because the principal is created dynamically, the server can buffer its part, send its size and the corresponding block, and then repeat the process before the body sends it out. The server can use a block of size 0 as a signal to end the body so that it can continue to stay connected and prepare for the next response
chunked coding is quite simple and shows the basic structure of a block coded message. It starts with the HTTP response header block, followed by a series of blocks. Each tile contains a length value and the data for that tile. The length value is in hexadecimal form and separates the CRLF from the data. The size of the data in the chunking is calculated in bytes, excluding the CRLF sequence between the length value and the data and the CRLF sequence at the end of the block. The last block is a bit special, it has a length value of 0, which means "body end."
The client can also send chunked data to the server. Because the client does not know beforehand whether the server accepts chunked encoding (this is because the server does not send the TE header in response to the client), the client must be ready for the server to reject block requests with 411 Length Required (requires a content-length header) response
2. Trailer for chunked messages
If the client's TE header says it can accept a trailer, it can add a trailer at the end of the chunked message. The server that produces the original response can also add a trailer at the end of the chunked message. Trailer content is optional no data, clients do not necessarily need to understand and use, the client can ignore and discard the content in the trailer
Trailers can contain the accompanying header fields whose values may not be determined at the beginning of the message (for example, the contents of the principal must be generated first). The CONTENT-MD5 header is a header that can be sent in a trailer because it is difficult to figure out the MD5 of a document before it is generated. Shows how the trailer is used. The header of the message contains a trailer header, which lists the first list followed by the message. The header listed in the trailer header is immediately after the last block
In addition to transfer-encoding, trailer, and content-length headers, other HTTP headers can be sent as trailers
Content encoding and transmission encoding can be used at the same time. For example, shows how the sender compresses an HTML file with content encoding and then sends it using a transport-encoded chunk. Receiver "Refactor" the body of the process and the sender opposite
"Rules for transport Encoding"
When using transmission encoding for a message body, the following rules must be observed: "chunking" must be included in the transport encoding collection. The only exception is to end the message with a closed connection, which must be the last action on the message body when using the chunked transfer encoding, and the chunked transmission encoding cannot be used on a message body multiple times. These rules allow the receiver to determine the transmission length of the message.
The transfer encoding is a relatively new feature introduced in version HTTP1.1. The server that implements the transfer encoding must pay particular attention to not sending the transmitted encoded message to the non-http/1.1 application. Similarly, if the server receives an incomprehensible transmitted encoded message, it should use the 501 unimplemented status code to reply. However, at a minimum, all http/1.1 applications must support chunked encoding
Instance manipulation
Site objects are not static. The same URLs point to different versions of the objects as they change over time. On CNN's homepage, for example, multiple visits to http://www.cnn.com on the same day may have a slightly different return page each time.
You can think of CNN's homepage as an object, and its different versions can be seen as different instances of this object. In, the client requests the same resource (URL) multiple times, but gets a different instance of the resource because it changes over time. Have the same instance at time (a) and time (b), and in time (c) are different instances
The HTTP protocol specifies a series of request and response operations called instance manipulation (instance manipulations) to manipulate an instance of an object. The two main instance manipulation methods are range request and differential encoding. Both of these methods require the client to identify a specific copy of the resource it owns (if any), and to request a new instance under certain conditions
"Freshness"
Now again, the client does not initially have a copy of the resource, so it sends a request to the server to get one. The server responds with version 1 of the resource. The client can now cache this copy, but how long will it be cached?
When a document is "expired" by the client (that is, the client no longer considers the copy to be valid), the client must request a new copy from the server. However, if the document does not change on the server, the client does not need to receive it again--continue to use the cached copy
This special request, called a conditional request, requires the client to use a verification code (validator) to inform the server of the version number it currently owns and to request a new copy only if its current copy is no longer valid conditional
The server should tell the client how long it will be able to cache the content, which is fresh within this time. The server can use one of these two headers to provide this information: Expires (expired) and Cache-control (cache control)
Expires the first time the document is "out of date"-and should not be considered up-to-date thereafter. The syntax for the expires header is as follows:
Expires:sun Mar 23:59:59 GMT 2016
The client and server must synchronize their clocks in order to properly use the Expires header. This is not always easy because they may not run a clock synchronization protocol like Networktimeprotocol (Network Time Protocol, NTP). It is more useful to define outdated mechanisms with relative time. The Cache-control header can specify the maximum lifetime of a document in seconds-the total time from the date the document leaves the server. The usage period is not synchronized with the clock, so more accurate results can be given
In fact, the Cache-control first feature is very powerful. Both the server and the client can use it to describe freshness, and many instructions are available, in addition to the usage period or expiration time. The following table lists some instructions for the Cache-control header
"Verification Code"
When a replica in the cache server is requested, if it is no longer fresh, the cache server needs to ensure that it has a fresh copy. The cache server can obtain the current copy from the original server. In many cases, however, the document on the original server is still the same as the expired copy in the cache. The cached copy may have expired, but the content on the original server is still the same as the cached content. If the document on the server is the same as the cached copy that has expired, and the cache server is still fetching documents from the original server, then the cache server is wasting network bandwidth, adding unnecessary load to the cache server and the original server, slowing everything down.
To avoid this situation, HTTP provides a way for the client to request a copy only when the resource changes, a special request called a conditional request. A conditional request is a standard HTTP request message, but executes only if a certain condition is true. For example, a cache server might send the following conditional get message to the server and send it only if the file/announce.html has changed since June 29, 2016 (the time the cached document was last modified by the author):
Get/announce.html Http/1.0if-modified-since:sat, June, 14:30:00 GMT
A conditional request is implemented by a conditional header that begins with "if-". In the above example, the conditional header is if-modified-since (if-from ...). Later-modified). The conditional header causes the method to execute only if the condition is true. If the condition is not met, the server sends back an HTTP error code
Each conditional request works with a specific verification code. A verification code is a special property of a document instance that is used to test whether the condition is true. Conceptually, you can think of a verification code as a file's serial number, version number, or a date and time when the last change occurred.
The conditional header if-modified-since tests the date time that the document instance was last modified, so we say the last modified date time is the verification code. The conditional header If-none-match tests the ETag value of the document, which is a special keyword associated with the entity, or a version-aware tag. The last-modified and ETag are the two main types of authentication codes used by HTTP. The 4 HTTP headers for conditional requests are listed in the following table. Each conditional header is followed by the type of verification code used for this header
HTTP divides the verification code into two categories: the weak authentication code (weak validators) and the strong verification code (strong validators). A weak captcha does not necessarily uniquely identify an instance of a resource, which must be the case for strong authentication codes. An example of a weak verification code is the size of the object in bytes. It is possible that the content of the resource has changed, and the size remains the same, so the hypothetical byte count verification code is weakly correlated with the change. The cryptographic checksum of the resource content (for example, MD5) is a strong verification code that always changes as the document changes
The last modification time is treated as a weak verification code, because although it illustrates the last time the resource was modified, it has a maximum accuracy of 1 seconds. Because the resource can be changed many times in 1 seconds, and the server can process thousands of requests per second, the last modification date time does not always reflect the change situation. The ETag header is treated as a strong verification code because the server can place different values at the ETag header each time the resource content changes. The version number and digest checksum are also good ETag header candidates, but they cannot have arbitrary text. The ETag header is flexible enough to carry arbitrary text values (in the form of tokens) that can be used to design a variety of client and server validation policies
Sometimes, clients and servers may need to take a less precise approach to entity tag validation. For example, a server might want to embellish a large, widely cached document, but not to generate significant traffic when the cache server is re-authenticated. In this case, the server can precede the tag with a "w/" prefix to broadcast a "weak" entity tag. For weak entity tags, the markup changes only when the associated entity has a significant semantic change. A strong entity tag is bound to change regardless of the nature of the associated entity.
The following example shows how the client can request re-authentication to the server with a weak entity token. The server returns the principal only when the contents of the document have changed significantly from version 4.0:
get/announce.html http/1.1if-none-match:w/"v4.0"
When a client accesses the same resource multiple times, it first needs to determine if its current copy is still fresh. If they are no longer fresh, they must obtain the latest version from the server. To avoid receiving an identical copy of the resource without changes, the client can send a conditional request to the server stating the verification code that uniquely identifies the current copy of the client. The server sends its copy only if the copy of the resource and the client is different
"Scope Request"
It has been clearly explained how the client requires the server to send its copy only if the client copy of the resource is no longer valid. HTTP also adds to the icing on the cake: it allows the client to actually request only part of the document, or a range
Assuming that the latest hot software is being downloaded over a slow modem connection, it's three-fourths, and suddenly the connection is interrupted because of a network failure. You've been waiting a long time for the download to finish, and now you're forced to start all over again, praying that it won't happen again.
With a scope request, the HTTP client can resume downloading the entity by requesting a range (or part of it) of the entity that failed. There is, of course, a premise that the object has not changed since the last time the client requested the entity to the time of the request to make the range.
get/bigfile.html http/1.1host:www.joes-hardware.com range:bytes=4000-user-agent:mozilla/4.61 [en] (WinNT; I)
In this case, the client is requesting a portion of the document that begins at the beginning of the 4000 bytes (no end-byte count is given, because the requester may not know the size of the document). This form of scope request can be used in cases where the client has failed after receiving the beginning of 4000 bytes. You can also use the range header to request multiple ranges that can be given in any order or overlap each other.
For example, suppose a client connects to multiple servers at the same time, downloading different parts of the same document from different servers in order to speed up downloading documents. For cases where the client requests several different scopes within a request, the returned response is a single entity with a multipart body and a content-type:multipart/byteranges header
Not all servers accept range requests, but many servers can. The server can explain to the client the acceptable range request by including the Accept-ranges header in the response. The value of this header is the unit of the calculated range, usually in bytes. For example:
http/1.1 0kdate:fri, 22:35:15 gmtserver:apache/1.2.4accept-ranges:bytes
The range header is widely used in popular point-to-point (peer-to-peer,p2p) file sharing client software, which simultaneously downloads different parts of a multimedia file from different peer entities
Note that scope requests also belong to a class of instance manipulation, because they exchange information between the client and the server for a particular object instance. That is, the client's scope request only makes sense if the client and the server have the same version of the document
"Diff Code"
We have seen different versions of our site pages as different instances of the page. If the client has an expired copy of a page, it is requested to request the most recent instance of the page. If the server has an instance of the page update, it should be sent to the client, even if only a small part of the page has changed, but also to the full new page instance to the client
If there is less change, instead of sending a complete new page to the client, the client would prefer the server to send only the part of the page that changed, so that the most recent page can be obtained faster. Differential encoding is an extension of the HTTP protocol that optimizes transmission performance by exchanging portions of objects that change instead of the complete object. The differential encoding is also a class of instance manipulation because it relies on exchanging information between the client and server for a particular object instance. RFC 3229 describes the differential encoding
Clearly shows the structure of the differential coding, including the entire process of requesting, generating, receiving, and assembling documents. The client must tell the server which version of the page it has, and it is willing to accept the difference (delta) of the latest version of the page, which knows which algorithms apply the differences to the existing version. The server must check if it has an existing version of the client for this page, calculate the difference between the client's existing version and the latest version (there are several algorithms that can calculate the difference between the two objects). The server must then calculate the difference, send it to the client, tell the client that the difference is being sent, and describe the new identity (ETAG) of the latest version of the page, because the client will get this version after applying the difference to its old version
The client uses the unique identity of the page version it holds in the If-none-match header, which is sent in the ETag header before the server responds to the client. The client is saying to the server: "If you have the most recent version of the page and this etag is different, send me the latest version of this page." "If only the If-none-match header is available, the server will send the latest version of the page to the client completely." (assuming that the latest version differs from the version that the client holds)
However, if the client wants to tell the server that it is willing to accept the differences of the page, just send the A-im header. A-im is an abbreviation for accept-instance-manipulation (accepting instance manipulation). Image analogy, the client is equivalent to saying: "Oh yes, I can accept some form of instance manipulation, if you will one of them, you do not have to send the full document to me." "In the A-im header, the client will show that it knows which algorithms can apply the difference to the old version and get the latest version." The server sends back the following: A special response code--226 IM used tells the client that it is sending an instance manipulation of the requested object, not the complete object itself; an IM (instance-manipulation abbreviation) header, Describes the algorithm used to calculate the variance, the new ETag header and the delta-base header, stating the etag of the baseline document used to calculate the variance (theoretically, it should be the same as the ETag in the If-none-match header in the client's previous request)
The following table summarizes the header used by the differential encoding
The client can use the A-im header to describe the types of instances that can be manipulated. The server describes what instance manipulation is used in the IM header. But in the end, which instance manipulation types are acceptable? And what do they do? Some of the instance manipulation types registered in the IANA are listed in the following table
, the difference generator on the server side calculates the difference between the baseline document and the most recent instance of the document, using the algorithm specified by the client in the A-im header. The difference app on the client side is different, applies it to the baseline document, and gets the latest instance of the document. For example, if the algorithm that makes the difference is the DIFF-E command of the UNIX system, the client can apply the difference using the functionality provided by the text editor in the UNIX system, because DIFF-E <file1> <file2> produces a series of ED commands to <file1> conversion to <file2>. Ed is a very simple editor that supports a number of commands. Example, 5c indicates that you want to delete the 5th line of the baseline document, and Chisels.<cr>. Description to add chisels., it's that simple. For larger changes, more complex instructions are generated. The DIFF-E algorithm for UNIX systems is to compare files on a line-by-row basis, which is not a problem for text files, but is not suitable for binary files. Vcdiff algorithm is more powerful, for non-text files are also applicable, and the resulting difference is smaller than the DIFF-E
The format of the a-im and IM headers is defined in detail in the specification of the differential encoding. Here, as long as we know that these headers can show that multiple instance manipulations (and can have associated mass values) are sufficient, the document can be manipulated by multiple instances before being returned to the client, so that the maximum amount of compression can be achieved. For example, the differences resulting from the VCDIFF algorithm can then be compressed using the GZIP algorithm. The server's response contains the Im:vcdiff,gzip header. The client should first gunzip the content and then apply the resulting differences to its own baseline page in order to generate the final Document
Differential coding can reduce the number of transfers, but can be cumbersome to implement. Imagine the page changes frequently, and there are many different people who are accessing the situation. A server that supports differential encoding must save all different versions of the page over time to indicate the difference between the latest version and any version held by the requested client
If documents change frequently, and many clients are requesting documents, they get different instances of the document. They then request the difference between the version they hold and the latest version when they then make a request to the server. In order to be able to send only the changed parts to them, the server must keep all the versions that the client has ever held
To reduce the delay time when the document is submitted, the server must increase disk space to hold various old instances of the document. The additional disk space required to implement the differential encoding may quickly offset the benefits of reduced throughput
Frontend Learning HTTP entities and encodings