Introduction: In the RESTful class of service design, often encounter the need to use the URL address in Chinese as a condition, in this case, it is generally necessary to correct the setting and encoding Chinese characters information. Garbled problem arises in this, how to solve it? And listen to this article in detail.
1. The problem arises
In restful service design, when querying some information, the general URL address is designed as: Get/basic/service? Keyword= history, and the like URL address. However, in the actual development and use, there is a garbled situation, in the background read keyword information is garbled, can not be read correctly.
2. How are garbled characters generated?
Because we use URLs to pass parameters this way is dependent on the browser environment, that is, the URL and the URL contained in the various Key=value format of the pass parameter key value pair parameters are processed in the browser address bar in the processing principle of the corresponding encoding passed to the background for decoding. Since we do not have any processing, when the JavaScript request URL and the argument exists in Chinese (that is, input box input in Chinese), the URL of the Chinese language parameters are encoded according to the browser mechanism. Encoding has a garbled problem at this time. 3. Initial encoding, JavaScript is encoded using the encodeURI () method. When encoding Chinese URL parameters in JavaScript using encodeURI (), the word "test" is converted to "%e6%b5%8b%e8%af%95". But the problem still exists. The reason is that after the encoded string information, the browser mechanism will assume that "%" is an escape character, and the browser will pass the converted parameter "%" in the Address bar URL and the "%" of the escaped character between the passed in to the background. This results in a mismatch with the URL after the actual encodeURI () encoding, because the browser mistakenly considers "%" to be an escape character character, and it does not consider "%" to be a normal character. 4. Two times encoding, using encodeURI operation: encodeURI (encodeURI ("/order?name=" + name); The processed URL is not passed through a encodeURI () converted string "%e6%b5%8b%e8%af%95", but instead passes through the previous step two layers encodeURI () processing the URL after processing the string "%25e6%b255%258b%25e8%af% 2595 ", by re-encoding the original is browsed to resolve to escape character"% "is re-encoded, converted to ordinary characters to"%25 ". At this time, the front-end JavaScript code with the Chinese URL encoding has been completed, and through the URL to pass the parameters passed to the background to wait for processing, the action gets to normal conversion cut no garbled parameter is "%25e6%b255%258b%25e8%af%2595", The Chinese character of this string is the word "test" we entered.
5. How does the background correctly parse Chinese character information?
After two encodeURI (), direct reading is unable to get the correct information after entering the background. You need to continue with the following:
[HTML]View PlainCopy
- Urldecoder.decode ("Chinese string", "UTF-8")
The Urldecoder decode (string str,string ECN) method has two parameters, the first parameter is the string to be decoded, and the second parameter is the corresponding encoding at the time of decoding.
6. encodeURI, encodeURIComponent, Escape
6.1 Escape () function
The escape () function encodes the string so that it can be read on all computers.
Return value: A copy of the encoded string. Some of these characters are replaced with 16-binary escape sequences.
Description: The method does not encode ASCII letters and numbers, nor does it encode the following ASCII punctuation marks:-_. ! ~ * ' (). All other characters will be replaced by escape sequences. All space characters, punctuation marks, special characters, and other non-ASCII characters are converted to the character encoding in the%xx format (XX equals the encoded 16-digit number of the character in the character set table). For example, the encoding for a space character is%20. Characters that are not encoded by this method: @ */+
6.2 encodeURI () method
Converts the URI string into an escape format using the UTF-8 encoding format. Characters that will not be encoded by this method:! @ # $& * () =:/;? +
6.3 encodeURIComponent () method
Converts the URI string into an escape format using the UTF-8 encoding format. Compared to encodeURI (), this method encodes more characters, such as characters. So if the string contains several parts of the URI, it cannot be encoded in this way, otherwise the URL will display an error after the/character is encoded.
Characters that will not be encoded by this method:! * ( ) ‘
Therefore, for the Chinese string, if you do not want to convert the string encoding format into UTF-8 format (such as the original page and the target page charset is consistent), only need to use escape. If your page is GB2312 or other encoding, and the page that accepts the parameter is UTF-8 encoded, it is necessary to use encodeURI or encodeuricomponent.
7. Another Chinese garbled scheme for handling URLs
The medium character on the requester side has encodeuri to transcode once, such as:
var url= "/ajax?name=" +encodeuri (name);
Server-side code:
Name=new String (name.getbytes ("iso8859-1"), "UTF-8");
Note: Name is the obtained string, iso8859-1 is the default character encoding for the project, and is not processed if the Chinese encoding gbk,gb2312.
Analysis: Verified by the program, the result is feasible. Thus, the default encoding method of the browser itself is iso8859-1, even if encodeURI is used for UTF-8 encoding, the main string content, such as ASCII characters and visible characters, is based on the characters of the Iso8859-1 browser itself. The reason is that these characters are coincident on the encoding and the UTF-8 string. Escape functions such as encodeURI are primarily addressed by the escaping of characters such as special characters%,/.
encodeURI and decodeURI methods in JavaScript
First, the basic concept
encodeURI and decodeURI are used in pairs, because the browser's address bar has Chinese characters, you can have unexpected errors, so you can encodeuri non-English characters into English encoding, decodeURI can be used to restore the characters back. The encodeURI method does not encode the following characters: ":", "/", ";" and "?", and the encodeURIComponent method can encode these characters.
The decodeURI () method is equivalent to Java.net.URLDecoder.decode (uristring, "UTF-8");
The encodeURI () method is equivalent to Java.net.URLEncoder.encode (uristring, "UTF-8");
Ii. examples
<script type= "Text/javascript" >
var uristr = "Http://www.baidu.com?name= Zhang San &num=001 zs";
var Uriec = encodeURI(URISTR);
document.write ("encoded" + Uriec);
var uridc = decodeURI(Uriec);
document.write ("decoded" + URIDC);
</script>
Http://www.baidu.com?name=%E5%BC%A0%E4%B8%89&num=001%20zs after encoding
Decoded http://www.baidu.com?name= Zhang San &num=001 zs
Description of the Java JDK: Java.net.URLEncoder.encode (uristring, "UTF-8");
A utility class that is encoded in HTML format. The class contains a static method that converts a String to application/x-www-form-urlencoded
MIME format. For more information about HTML format encoding, see the HTML specification.
When encoding String, use the following rules:
- The alphanumeric characters "
a
to" z
, "to" A
and "to" Z
remain unchanged 0
9
.
- Special characters "," "," "
.
-
and" *
_
"remain unchanged.
- The space character "
" is converted to a plus sign " +
".
- All other characters are unsafe, so you first use some encoding mechanisms to convert them to one or more bytes. Each byte is then represented by a 3-character string "
%xy
", where xy is the two-bit hexadecimal representation of the byte. The recommended encoding mechanism is UTF-8. However, for compatibility reasons, if an encoding is not specified, the default encoding for the corresponding platform is used.
For example, using the UTF-8 encoding mechanism, the string "The Stringü@foo-bar" will be converted to "The+string+%c3%bc%40foo-bar" because in UTF-8, the character U is encoded as two bytes, C3 (hex) and BC (16 in Character @ is encoded as a byte 40 (hexadecimal).
Encode
encode (String s, String enc) throws Unsupportedencodingexception
-
Converts a string to a format using the specified encoding mechanism
application/x-www-form-urlencoded
. The method uses the provided encoding mechanism to obtain the bytes of the unsafe character.
Note: The World Wide Web Consortium recommendation declares that UTF-8 should be used. If you do not use this encoding, it may cause incompatibilities.
-
-
Parameters:
-
s
-to convert
String
.
-
enc
-the name of the supported character encoding.
-
Return:
-
The converted
String
.
-
Thrown:
-
UnsupportedEncodingException
-If the specified encoding is not supported
Description of the JDK: Java.net.URLDecoder.decode (uristring, "UTF-8");
A utility class that is decoded in HTML format. This class contains a application/x-www-form-urlencoded
static method that decodes a String from MIME format.
The conversion process is exactly the reverse of the process used by the Urlencoder class. Assume that all characters in the encoded string are one of the following: "To", "to", "to" and ",", "", and " a
z
A
Z
0
9
-
_
.
*
. The " %
" character is allowed, but it is interpreted as the start of a special escape sequence.
The following rules are used in the conversion:
- The alphanumeric characters "
a
to" z
, "to" A
and "to" Z
remain unchanged 0
9
.
- Special characters "," "," "
.
-
and" *
_
"remain unchanged.
- The plus sign "
+
" is converted to the space character "
".
- The "
%xy
" format sequence is treated as a byte, where xy is a two-bit hexadecimal representation of 8 bits. Then, all substrings that contain one or more of these byte sequences consecutively will be replaced by characters whose encoding can generate these contiguous bytes. You can specify the encoding mechanism for decoding these characters, or, if not specified, the default encoding mechanism of the platform.
There are two possible ways that the decoder can handle illegal strings. One method is to throw IllegalArgumentException
an exception regardless of the illegal character. Which method the decoder body uses depends on the implementation.
Decode
Decode (String s, String enc) throws Unsupportedencodingexception
-
decodes a string using the specified encoding mechanism
application/x-www-form-urlencoded
. The given encoding is used to determine the
%xy
character that is represented by a sequential sequence of any "" format.
Note: The World Wide Web Consortium recommendation declares that UTF-8 should be used. If you do not use this encoding, it may cause incompatibilities.
-
-
Parameters:
-
s
-
String
The name of the
-
enc
supported character encoding to decode.
-
Return:
-
The newly decoded
String
-
Thrown:
-
UnsupportedEncodingException
-If a reference character encoding is required and the specified character encoding is not supported
encodeURI and decodeURI methods, Java.net.URLDecoder.encode, Java.net.URLDecoder.decode in JavaScript in the URL address