HTTP entities and encodings

Source: Internet
Author: User
Tags diff ranges response code

1. Content-length: The size of the entity

The Content-length header indicates the byte size of the entity body in the message. This size contains all content encodings, such as gzip compression of the text file, Content-length header is the size of the compression, not the original size.

Unless the block code is used, the Content-length header is a message with the entity body that must be used. The content-length header is used to detect the truncation of messages resulting from a server crash and to correctly segment multiple messages that share a persistent connection.

1.1 Detection truncation

The client detects message truncation by Content-length.

Caching proxy servers typically do not cache the HTTP principals that do not have an explicit content-length header to reduce the risk of caching truncated messages.

1.2 Content-length with persistent connections

If the response is routed through a persistent connection, there may be another HTTP response immediately following it. The client can know where the message ends and where the next message begins with the content-length header. Because the connection is persistent, the client cannot rely on the connection shutdown to determine the end of the message.

There is a case where a persistent connection can be used without the content-length header, when chunked encoding (chunked encoding) is used. In the case of block coding, the data is divided into a series of blocks to send, each block has a size description.

1.3 Content Encoding

If the subject is content encoded, the Content-length header describes the length of the bytes of the body after encoding (encoded), not the length of the original body that was not encoded.

2. Entity Summary

The server uses the CONTENT-MD5 header to send the result of running the MD5 algorithm on the entity body. Only the original server that generated the response can calculate and send the CONTENT-MD5 header. Intermediate proxies and caches should not modify or add this header, otherwise they will conflict with this ultimate purpose of verifying end-to-end integrity. The CONTENT-MD5 header is calculated after the content has been encoded in all the required content and has not yet been done with any transmission encoding. In order to verify the integrity of the message, the client must decode the transmission encoding first, and then compute the resulting MD5 of the entity body without transmitting the encoding.

If a document is compressed using the GZIP algorithm and then sent using chunked encoding, then the entire gzip-compressed body is MD5 computed.

3. Media type and Character set

The Content-type header field describes the MIME type of the entity body. MIME types are standardized names that describe the basic media type that is the carrier of the goods.

A MIME type consists of a main media type (such as text, image, or audio) followed by a slash and a subtype, which is used to further describe the media type.

The following are some of the MIME types commonly used in Content-type headers.

    • Text/html: Entity body is an HTML document
    • Text/plain: Entity body is a plain text document
    • Image/gif: Entity body is an image in GIF format
    • IMAGE/JPEG: Entity body is an image in JPEG format
    • Audio/x-wav: Entity body contains images in WAV format
    • MODEL/VRML: Entity body is a three-dimensional VRML model
    • Application/vnd.ms-powerpoint: Entity body is a Microsoft PowerPoint presentation document
    • Multipart/byterange: The entity body has several parts, each containing a different byte range in the complete document
    • Message/http: The entity Body contains the complete htttp message

The Content-type header describes the media type of the original entity body. For example, if the entity is content encoded, the Content-type header describes the type of entity body that precedes the encoding.

3.1 Character encoding of text

The Content-type header also supports optional parameters to further describe the type of content. such as the CharSet (character set) parameter, which describes how to convert a bit in an entity to a character in a text file:

Content-Type: text/html; charset=iso-8859-4
3.2 + Media types

The multipart (multipart) e-mail message in MIME contains multiple messages, which together are sent as a single, complex message. Each part is independent and has its own set of content that describes it; the different parts are concatenated together by a delimited string.

HTTP also supports multi-part principals. However, it is usually used only in one of the following two situations: To submit a completed form, or as a range response that hosts several document fragments.

More than 3.3 sub-forms submitted

When you submit a completed HTTP table, the variable length text fields and uploaded objects are sent as separate parts of the multi-part body, so that you can fill in the various types and lengths of values in the table.

HTTP uses headers such as Content-type:multipart/form-data or content-type:multipart/mixed as well as multi-part principals to send such requests, as shown in the following example:

Content-Type: multipart/form-data; boundary=[abcdefghijklnopqrstuvwxyz]

Where the boundary parameter describes the string used to split the different parts of the body.

The following example shows the Multipart/form-data encoding. Suppose you have the following table:

If the user types Sally in the text entry field and selects the text file Essayfile.txt, the user agent agent may send back the following data:

Contet-Type: multipart/form-data; boundary=AaB03x--AaB03xContent-Disposition: form-data; name="submit-name"Sally--AaB03xContent-Disposition: form-data; name="files"; filename="essayfile.txt"Content-Type: text/plain..contents of essayfile.txt--AaB03x
3.4 + Range Response

The HTTP response to a range request can also be multi-part. Such responses have content-type:multipart/byteranges headers and multi-part bodies with different scopes. The following example shows a response to a request from a different range of documents:

4. Content encoding

HTTP applications sometimes need to encode content before they are sent.

4.1 Content encoding Process
    1. The Web server generates the original response message, which has the original content-type and Content-length headers.
    2. The content encoding server creates the encoded message. The encoded message has the same content-type but the content-length may be different (if the subject is compressed). The content encoding server adds the content-encoding header to the encoded message so that the receiving application can decode it.
    3. The receiving program obtains the encoded message, decodes it, and obtains the original message.
4.2 Content Encoding Type

HTTP defines a number of standard content encoding types, and allows additional encodings to be added in the form of extended encodings. The code is standardized by the Internet Number Distribution Agency (IANA), which assigns a unique code name to each content encoding algorithm. The content-encoding header uses these standardized codes to illustrate the algorithm used in encoding.

    • Gzip: Indicates that the entity is using the GNU ZIP code
    • Compress: Indicates that the entity is using Unix's file compression program
    • Deflate: Indicates that the entity is compressed in zlib format
    • Identity: Indicates that the entity is not encoded. When there is no content-encoding header, it is implied that the situation

gzip, compress, and deflate codes are lossless compression algorithms that reduce the size of transmitted messages without causing loss of information.

4.3 accept-encoding header

The Accept-encoding field contains a comma-delimited list of supported encodings:

    • Accept-encoding:compress, gzip
    • Accept-encoding:
    • Accept-encoding: *
    • accept-encoding:compress;1=0.5, gzip;q=1.0
    • accept-encoding:gzip;q=1.0, identity; q=1.0, *;q=0

The client can give each encoding a Q (mass) value parameter to describe the priority of the encoding. The Q value ranges from 0.0 to 1.0, and 0.0 indicates that the client does not want to accept the encoded code, and 1.0 indicates the encoding that is most desirable to use. "*" means "any other method".

The identity code designator can only appear in the accept-encoding header, which the client uses to describe the priority relative to other content encoding algorithms.

5. Transfer encoding and block coding

The transfer encoding is used to change the way in which the data in the message is transmitted over the network.

5.1 Transfer-encoding Header

The HTTP protocol only defines the following two headers to describe and control the transmission encoding:

    • Transfer-encoding: Tell the receiver what encoding it has been given in order to reliably transmit the message.
    • TE: Used in the request header to tell the server which transport encoding extensions can be used.

As the following example, the request uses the TE header to tell the server that it can accept the block encoding and is willing to accept slippers attached to the end of the chunked message:

GET /new_products.html HTTP/1.1Host: www.joes-hardware.comUser-Agent: Mozilla/4.61 [en] (WinNT; I)TE: trailers, chunked...

The response to it contains the transfer-encoding header, which tells the receiver that the message has been encoded using chunked encoding:

HTTP/1.1 200 OKTransfer-Encoding: chunkedServer: Apache/3.0...
5.2 chunked Encoding

The chunked code divides the message into several known chunks of size. The blocks are sent next to each other, so that you do not need to know the size of the entire message before sending it.

Note: chunked encoding is a transmission encoding and therefore a property of the message, not the principal. Multi-part coding is the property of the subject, and it is completely independent from the block coding.

1. chunking and persistent connections

When using a persistent connection, you must know its size and send it in the Content-length header before the server writes the principal. If the server dynamically creates content, it may not be able to know the length of the principal before it is sent.

Block coding solves this problem, as long as the server is allowed to send the main body by block, indicating the size of each block can be.

The block encoding starts with the first block of the HTTP response, followed by a series of blocks. Each tile contains a length value and the data for that tile. The length value is in hexadecimal form and separates the CRLF from the data. The size of the data in the chunking is calculated in bytes, excluding the CRLF sequence between the length value and the data and the CRLF sequence at the end of the block. The last block has a length value of 0, which means "body end."

2. Trailer for chunked messages

If the client's TE header indicates that it can accept a trailer, it can add a trailer at the end of the chunked message. The server that produces the original response can also add a trailer at the end of the chunked message. Trailer content is optional metadata that the client does not necessarily need to understand and use (the client can ignore and discard the content in the trailer).

Trailers can contain the accompanying header fields whose values may not be deterministic at the beginning of the message (for example, the contents of the subject must be generated first). The CONTENT-MD5 header is a header that can be sent in a trailer because it is difficult to figure out the MD5 of a document before it is generated.

In addition to the transfer-encoding, Trailer, and Content-length headers, other HTTP headers can be sent as trailers.

6. Scope Request

HTTP allows clients to request only a portion of a document, or a range. The server can explain to the client the acceptable range request by including the Accept-rranges header in the response. The value of this header is the unit of the calculated range, usually in bytes:

HTTP/1.1 200 OKDate: Fri, 05 Nov 1999 22:35:15 GMTServer: Apache/1.2.4Accept-Ranges: bytes...
7. Differential coding

Differential encoding is an extension of the HTTP protocol that optimizes transmission performance by exchanging portions of objects that change instead of the complete object.

The client uses the unique identity of the page version it holds in the If-none-match header, which is sent in the ETAG header before the server responds to the client.

The header used for the differential encoding:

    • ETag: Unique identifier for each instance of the document. Sent by the server in the response; The client can use it in the If-match header and If-none-match header of the subsequent request
    • If-none-match: The request header sent by the client, when and only if the client's document version is not the same as the server, the line server requests the document
    • A-im: Client request header, description of acceptable instance manipulation types
    • IM: Server response header, which describes the type of instance manipulation that acts on the response. This header is sent when the response code is 226 IM used
    • Delta-base: Server response header, stating the ETag value of the baseline document used to calculate the variance (should be the same as the ETag in the If-none-match header in the client request)
7.1 Instance manipulation, diff generator, and diff application

The client can use the A-im header to describe the types of instances that can be manipulated. The server describes what instance manipulation is used in the IM header.

Instance manipulation types registered in the IANA:

    • Vcdiff: Calculating differences with the VCDIFF algorithm
    • Diffe: Calculating differences with the Unix system's DIFF-E command
    • Gdiff: Calculating differences with the Gdiff algorithm
    • Gzip: Compress with gzip algorithm
    • Deflate: Compression with deflate algorithm
    • Range: Used in the server's response to indicate that the response is part of the range selection
    • Identity: Used in the A-im header in the client request, stating that the client is willing to accept the identity instance manipulation

HTTP entities and encodings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.