Message encoding in HTTP

Source: Internet
Author: User

Reprint: http://www.51testing.com /? Uid-390472-action-viewspace-itemid-233986

 

In the past, we probably knew about urlencoded, application/X-WWW-form-urlencoded, and other things. We also made some programs that can run, but it was not very clear. Today, I sawHTTP: the definitive guideThis e-book finally completely understands what is going on.

First, let's review the HTTP packets. There are two types of HTTP packets: one is request packets and the other is response packets. The two formats are the same as the encoding rules, therefore, the request message is used as an example.

The Request Message consists of three parts:

(1) start line

(2) headers

(3) Body

Well, the first thing to be clear is that (1) and (2) must be ASCII characters, that is, the character encoding that appears in (1) and (2) must be 0. (3) The content can be any encoding, character encoding, image encoding, or any binary encoding. The Content-Type: Header in (2) is used to describe what is in the end.

The general format is as follows: (1) Startline must end with CRLF. Cr and LF are also ASCII codes. (2) headers must also end with CRLF. It should be noted that even if one header does not exist, a CRLF is still required to indicate the end Of the header.

Specifically:

1. For Startline

Method Request-URL version CRLF, where method is the method name, such as get and post, followed by space, followed by the request URL, followed by space, followed by version, followed by CRLF.

Note the URL encoding. As mentioned above, the content in Startline must be ASCII code, while the URL in Startline is more demanding. The URL format is http: // hostname: Port/P1/P2/resource, where: // is a fixed encoding,/is used to separate paths,: used to specify the port number, resource specifies the Resource Name, P1, p2 is the path name. The strict requirements of URLs are that hostname, P1, P2, and resource names must be limited to a subset of ASCII codes. See the following table:

 

Unreserved

[A-Za-z0-9] | "-" | "_" | "." | "! "| "~ "|" * "|" '"|" ("| ")"

Reserved

";" | "/" | "? "|": "|" @ "|" & "|" = "|" + "|" $ "| ","

Escape

"%" <Hex>

 

The ASCII code in the reserved row cannot appear in hostname, P1, P2, and resource. What if you need these characters, in this case, you need to use the urlencode method to encode unsupported characters as allowed characters. For example, the original resource name is ~ Voice, then the code is changed to % 7 evoice, where 7E is ~ The hexadecimal ASCII representation of the character. In principle, this method can only encode the reserved ASCII code, and now people extend this method, also use this method to encode complex characters, such as gb2312 and UTF-8, for example, encode "Good Guys" of gb2312 as % Ba % C3 % C8 % CB, and encode "Good Guys" of UTF-8 as % E5 % a5 % BD % E4 % Ba, although this is not a standard, it has become a practical standard.

 

2. For Headers

The specific format of the header is shown in blue: Name: valuecrlf. name indicates the name of the variable in the header, followed by a colon, followed by an optional space, followed by the value of the variable, followed by CRLF.

3. For the body

What is in the body, what encoding is used for the character, and what format is used for the image, all of which are specified in headers. Content-Type specifies what is inside the body, what encoding, such as Content-Type: text/html; charset = UTF-8, indicating that the content in the body is an HTML file, it is coded in UTF-8. Note that Content-Type: Application/X-WWW-form-urlencoded is a common message type in post, which indicates that form data is stored in the body, the encoding is urlencoded. First, the body content in this format must be an ascii code, except for formatting characters,OthersThe character must be limited to the unreserved subset of ASCII. For example, the body format is name1 = value1 & name2 = value2 & name3 = & name4 = value4, name1, name2, name3, name4 as the variable name, value1, value2, value3, value4 is the value of the variable, = and & are formatted characters. The encoding of name1, name2, name3, name4, value1, value2, and value4 must be the unreserved subset of ASCII.

Programming tips:

Only encode the URL where urlencode is required. Do not encode all URL encoding. Like get http://www.baidu.com/s? WD = ~ Testcrlf.

(1) first, determine the URL part. The URL part cannot be URL encoded. Apparently the URL part is http://www.baidu.com/s? WD = ~ Test. The get, space, and CRLF parameters do not belong to the URL and cannot be specially encoded.

(2) determine the URL part that requires urlencoded. Only www.baidu.com, S, WD ,~ Test. Although www.baidu.com, S, and WD are not encoded before and after, they are also part of the encoding. For http ://,:,/,?, =, They are formatted characters, which have special meanings and cannot be encoded by urlencode.

So Startline can be generated as follows:

String Startline = "get" + "http: //" + urlencode (www.baidu.com) + "/" + urlencode ("S") + "? "+ Urlencode (" WD ") +" = "+ urlencode ("~ Test ") +"/R/N ";

Never write it.

String Startline = urlencode ("Get http://www.baidu.com/s? WD = ~ Test/R/N ");

Do not write

String Startline = "get" + urlencode ("htt: // www.baidu.com/s? WD = ~ Test ") +"/R/N ";

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.