Parameter encoding Specification

Source: Internet
Author: User

 

I. Summary

We often need to pass Chinese data on pages, but we are often confused by text encoding. sometimes I don't know whether it is about browser encoding or server encoding. this article analyzes the encoding principles of data transmitted over the Internet, and proposes a complete and easy-to-use solution.

Ii. Principles

Avoid passing Chinese characters directly when the get or post parameters. Chinese parameters must be encoded before being passed. The server must use the same encoding format for decoding.

Iii. incorrect ideas

1. Many Programmers think that Chinese characters can be passed in URLs.

The URL does not contain Chinese parameters. If we enter "http: // localhost/?" in the browser /? A = Chinese ". It seems that we have Chinese characters in the URL. In fact, when you press the Enter key, the browser automatically encodes the Chinese characters and passes them to the server.

2. When garbled characters are obtained, check the encoding format of the server program first.

Many people think that the URL can pass Chinese characters and do not know that the browser has an automatic encoding behavior. Therefore, they simply think that the problem lies on the server side. in fact, even if the server finds the correct encoding format, we should not easily change the server's default encoding format.

3. encode the parameters before passing them, and decode the parameters obtained using the request object.

Many programmers believe that urlencode and other methods are used for encoding when passing parameters, and urldecode should be used for decoding when receiving parameters. this is a common error. Please note that the default request is used. querystring and request. the form has been automatically decoded once. The decoding format is the default encoding format set by the server.

4. Reasons

When a Chinese character is passed, the automatic encoding and decoding format is related to the setting of the browser and the server.

Test the get method of firefox3 and IE6 to send Chinese parameters, Firefox uses the UTF-8 format to encode Chinese parameters by default, and IE6 even if you set "always send URL with UTF-8" in advanced settings ", the gb2312 Encoded chinese parameters are still automatically used.

We can freely control the decoding format on the server side. however, it is often done by changing the server configuration. for example, for ASP. net program. you can. set the encoding and decoding format of the server segment in config:

<Globalization culture = "ZH-CN" uiculture = "ZH-CN" requestencoding = "UTF-8" responseencoding = "gb2312"/>

However, we cannot control browser behavior. Users may use different browsers.

5. Solutions

1. unified the default encoding format

(1) set the server-side encoding format for UTF-8

(2) The passed parameters are all encoded. The server (C #) uses the server. urlencode method, and the client (JavaScript) uses the encodeuricomponent method.

Note:

The client's JavaScript function encodeuricomponent can only use the UTF-8 encoding format. So you need to set the server side request and response to both UTF-8.

The defect is that if some partners must pass parameters in other encoding formats, the server side may obtain garbled characters. This solution is simple and suitable for most scenarios.

2. Specify the encoding format through encoding Parameters

To solve the possible problem of uniform encoding formats, we use the "encoding" parameter to display the specified encoding format. the encoding parameter must be passed in all requests, whether it is get or post.

(1) For JavaScript client encoding, The encodeuricomponent method is still used for encoding, and the value of the encoding parameter is specified as "UTF-8 ".

(2) For other encoding formats passed into the server, such as gb2312, we cannot use the default request. encode form or querystring. because the server-side encoding format may be set to UTF-8. request. form or querystring are automatically decoded using the encoding format specified by the server. therefore, you must use the following method to process the request and obtain the parameters:

/// <Summary> /// return the Ziqiu set of Request Parameters Based on the specified encoding format. zhang 2009.1.19 /// </Summary> /// <Param name = "request"> request object of the current request </param> /// <Param name = "encode"> encoded string </param> // <returns> is the parameter name, namevalue set with a value of parameter value </returns> Public static namevaluecollection getrequestparameters (httprequest request, string encode) {namevaluecollection result = NULL; encoding destencode = NULL; // obtain the encoding object if (! String. isnullorempty (encode) {try {// get the specified encoding format destencode = encoding. getencoding (encode);} catch {// if the specified encoding format fails to be obtained, set it to null destencode = NULL;} // obtain the Request Parameters Based on Different httpmethod methods. if no encoding object exists, the default server encoding is used. if (request. httpmethod = "Post") {If (null! = Destencode) {stream resstream = request. inputstream; byte [] filecontent = new byte [resstream. length]; resstream. read (filecontent, 0, filecontent. length); string postquery = destencode. getstring (filecontent); Result = httputility. parsequerystring (postquery, destencode);} else {result = request. form ;}} else {If (null! = Destencode) {result = system. web. httputility. parsequerystring (request. URL. query, destencode);} else {result = request. querystring ;}// return result ;}

 

Through the above method, we can get parameters using the specified encoding format for both get requests and post requests. if someone thinks that writing this method is in trouble, see "2. the third in the incorrect viewpoint.

This method returns a namevaluecollection object. when determining whether a parameter exists, you cannot use the method to check whether a key value exists. instead, you need to get the value through the key and then determine whether the value is null (which is somewhat different from the list ):

// Obtain the parameter. Assume that paramlist is a namevaluecollection object p1 = paramlist ["p1"]; // determine whether this parameter exists. If not, P1 is null if (! (String. isnullorempty (P1 )){...}

 

In addition, if the encoding object is not passed or the passed string cannot be converted to a strong-type encoding object, the default encoding format is used on the server side (that is, the querystring and form parameters of the request object are directly used to obtain parameters ).

Vi. Javascript encoding method

The sender of the request is called the client. We often need to use JavaScript to encode Chinese parameters on the client. The following functions related to encoding in javascript:

 

Function Name

 

Function Description

 

Explanation

 

Escape ()

 

The escape () function can encode a string so that the string can be read on all computers.

 

This method does not encode ASCII letters and numbers or the following ASCII punctuation marks :-_.! ~ *'(). All other characters are replaced by escape sequences.

 

[Expired] Use encodeuri () or encodeuricomponent ()

 

Unescape ()

 

The Unescape () function can decode strings encoded by escape.

 

This function works like this: Find the character sequence in the form of % XX and % uxxxx (X indicates a hexadecimal number ), decodes character sequences such as Unicode/u00xx and/uxxxx.

 

[Expired] Use decodeuri () or decodeuricomponent ()

 

Encodeuri ()

 

The encodeuri () function can encode a string as a URI.

 

 

This method does not encode ASCII letters and numbers, and does not encode these ASCII punctuation marks :-_.! ~ *'().

The purpose of this method is to fully encode the URI. Therefore, the encodeuri () function will not escape the following ASCII punctuation marks with special meanings in the URI :;/? : @ & =+ $ ,#

 

[Note] If the URI parameter contains characters that cannot be transferred, the encodeuricomponent () method should be used to encode the parameters respectively.

 

Decodeuri ()

 

The decodeuri () function can decode the URI encoded by the encodeuri () function.

 

 

 

 

Encodeuricomponent ()

 

The encodeuricomponent () function can encode a string as a URI component.

 

 

This method does not encode ASCII letters and numbers, and does not encode these ASCII punctuation marks :-_.! ~ *'().

Other characters (such :;/? : @ & =+ $, # The punctuation marks used to separate URI components) are all replaced by one or more hexadecimal escape sequences.

 

[Prompt] This method will encode special characters in the URI

 

Decodeuricomponent ()

 

The decodeuricomponent () function can decode the uri of the encodeuricomponent () function.

 

 

 

Escape and Unescape are no longer recommended in V3 standards. the encodeuri and encodeuricomponent methods should be used. for a URI (the URL is also a medium URI), if we want to send the request as a complete URL, but the URL contains Chinese characters, we should use the encodeuri method. if you want to encode the parameter, use encodeuricomponent.

The following examples illustrate the differences between the two methods:

Document. Write (encodeuricomponent ("http://www.w3school.com.cn") + "<br/> ")

Document. Write (encodeuri ("http://www.w3school.com.cn") + "<br/> ")

Result

HTTP % 3A % 2f % 2fwww.w3school.com.cn

Http://www.w3school.com.cn

7. Automatic browser Encoding

GET request

For GET requests, different browsers use different encoding methods to automatically encode Chinese parameters.

For example: Firefox/3.0.5 use UTF-8, IE6 use gb2312.

POST request

For post requests, the parameter value pairs in the form are sent to the server through the request body. In this case, the browser sends the request to the server based on the contenttype ("text/html; charset = GBK") of the webpage ") and then send it to the server. Add the following to the head of the HTML code:

<Meta http-equiv = "Content-Type" content = "text/html; charset = gb2312"/>

Firefox/3.0.5 uses the Chinese parameters for post Encoding Based on the encoding format set in charset.

IE6 does not work.

The experiment shows that the default encoding format of the client browser is uncertain. Therefore, we need to manually encode the parameters when passing Chinese characters.

8. Summary

The purpose of this article is to remind web programmers to pay attention to the automatic coding of browsers. In a project, the solution provided in this article will avoid the garbled problem caused by the passing of Chinese parameters. after reading the "cnblogs blog typographical skills" of yjinglee's blog, I reorganized this article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.