Use Urldecoder and Urlencoder to process Chinese

Source: Internet
Author: User
Tags alphanumeric characters

a urlencoder
a utility class that is encoded in HTML format. The class contains a static method that converts a String to the application/x-www-form-urlencoded MIME format. For more information about HTML format encoding, see the HTML specification.  

when encoding String, use the following rules:


The alphanumeric characters "a" to "Z", "a" to "Z" and "0" to "9" remain unchanged.  
the special characters ".", "-", "*" and "_" remain unchanged.  
The space character "" is converted to a plus sign "+".  
all other characters are unsafe, so you first use some encoding mechanisms to convert them to one or more bytes. Each byte is then represented by a 3-character string "%xy", where XY is the two-bit hexadecimal representation of the byte. The recommended encoding mechanism is UTF-8. However, for compatibility reasons, if an encoding is not specified, the default encoding for the corresponding platform is used.  
For example, using the UTF-8 encoding mechanism, the string "The Stringü@foo-bar" will be converted to "The+string+%c3%bc%40foo-bar" because in UTF-8, the character U is encoded as two bytes, C3 (hex) and BC (hexadecimal), the character @ is encoded as a byte 40 (hexadecimal).  

two urldecoder
This class contains a static method that decodes a String from the application/x-www-form-urlencoded MIME format.  

The conversion process is exactly the reverse of the process used by the Urlencoder class. Assume that all characters in the encoded string are one of the following: "A" to "Z", "a" to "Z", "0" to "9" and "-", "_", ".", and "*". The "%" character is allowed, but it is interpreted as the start of a special escape sequence.  
The following rules are used in the conversion:


The alphanumeric characters "a" to "Z", "a" to "Z" and "0" to "9" remain unchanged.  
the special characters ".", "-", "*" and "_" remain unchanged.  
The plus sign "+" is converted to the space character "".  
The "%XY" format sequence is treated as a byte, where XY is a two-bit hexadecimal representation of 8 bits. Then, all substrings that contain one or more of these byte sequences consecutively will be replaced by characters whose encoding can generate these contiguous bytes. You can specify the encoding mechanism for decoding these characters, or, if not specified, the default encoding mechanism of the platform.  
There are two possible ways that the decoder can handle illegal strings. One method is to throw a IllegalArgumentException exception regardless of the illegal character

Simple example:

try {String encodestr = Urlencoder.encode ("China", "utf-8");   SYSTEM.OUT.PRINTLN ("After processing:" + encodestr);   String decodestr = Urldecoder.decode (Encodestr, "utf-8");  System.out.println ("decode:" + decodestr);  } catch (Unsupportedencodingexception e) {//TODO auto-generated catch block E.printstacktrace (); }

Operation Result:

After processing:%E4%B8%AD%E5%9B%BD decoding: China

Use Urldecoder and Urlencoder to process Chinese

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.