URL encoding is a troublesome task. RFC 3986 is a standard for uri. Section 2nd defines how characters are represented in Uris, section 3rd classifies a URI as scheme, hier-part, query, and fragment components. According to this RFC, a URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. Appendix A defines ABNF.
Such as URL, http://www.qingbo.org /? P = 230 # comments, which includes all the four components mentioned above. This URL does not require percent encoding, because no reserved words are contained in each component, all of which are letters, numbers, or non-reserved ASCII visible characters (see section 3986 of RFC 2.3 ).
Suppose we open this URL in Firefox and want to add it to favorites through the Del. icio. us plug-in button. Del. icio. the US plug-in opens a new window and sends a request to the server through the get method. The URL and its corresponding title are passed as the query parameter to the server, the server fills these two values in the corresponding input value attribute.
If not encoded, the URL in this GET request is "http://del.icio.us/flimsy? Url = http://www.qingbo.org /? P = 230 # Comments & Title =» Blog Archive» the blog looks like & noui & jump = close & V = 4 ″. The problem arises. # What is next to the number? It should be interpreted as an anchor on the page. However, # comments is only part of the URL parameter. In addition, the URL contains Chinese characters and does not comply with the standard. Therefore, encoding is required. Perform percent encoding for each component and each parameter value in the query. note that not the whole URL (Del. icio. if the question mark after flimsy is encoded, the server does not know that it is followed by the query part. The link after correct encoding should be so long that it will not be displayed. You can copy the link address to see it (it seems that the browser automatically decode again when it is displayed, click to see the encoding result in the address bar ).
If there is no ready-made function, it is more convenient to percent encoding the UTF-8 byte sequence. The unreserved character value does not need to be converted. All other bytes are represented by % hexdig. In addition to "% 20", spaces can also be converted to "+" to save space.
If I have time, I will write another article about how to convert Chinese to UTF-8 byte sequences in windows, which may be helpful to my friends who encode Chinese URLs. See the article "GBK (gb2312) to the UTF-8 encoding conversion.
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/fanwenbo/archive/2008/04/14/2291878.aspx