Encode and decode Chinese characters using Urldecoder and Urlencoder

Source: Internet
Author: User

Summary:

Urldecoder and Urlencoder are used to complete the conversion between ordinary strings and application/x-www-form-urlencoded MIME strings. In this article, we use Urldecoder to solve the problem of GET request Chinese garbled to illustrate the usage of urldecoder/urlencoder, and give the application/x-www-form-urlencoded MIME The encoding rules for strings.

I. Overview of Urldecoder/urlencoder usage Scenarios

  Urldecoder and Urlencoder are used to complete the conversion between ordinary strings and application/x-www-form-urlencoded MIME strings. before introducing the application/x-www-form-urlencoded MIME string, let's consider the following scenario, as shown in:

            

We know that when we make a request to the client, the browser generates the corresponding request message to the server based on the request URL. In this process, if the URL that we enter in the address bar in the browser contains Chinese characters, the browser first encodes the Chinese characters and then sends them to the server. In fact, browsers convert them to application/x-www-form-urlencoded MIME strings, as shown in:

            

More specifically, when the URL address contains strings that are not Western European characters, the browser converts these non-western strings into application/x-www-form-urlencoded MIME strings. During the development process, we might involve the conversion of a normal string to a particular string, which would need to be implemented using the Urldecoder and Urlencoder classes, where:

    • The Urldecoder class contains a decode (string s,string Enc) static method that converts the application/x-www-form-urlencoded mime string to a normal string;

    • The Urlencoder class contains a encode (string s,string Enc) static method that converts a normal string to a application/x-www-form-urlencoded mime string.

      The following procedure demonstrates the conversion between a normal string transition and a application/x-www-form-urlencoded MIME string.

 Public  class urldecodertest {     Public Static void Main(string[] args)throwsException {//Convert application/x-www-form-urlencoded string to normal string        //The string is copied directly from the window shown, Chrome is encoded by default with the UTF-8 character set, so it should also be decoded with the corresponding character setSystem.out.println ("Decoding with the UTF-8 character set:"); String KeyWord = Urldecoder.decode ("%e5%a4%a9%e6%b4%a5%e5%a4%a7%e5%ad%a6+rico","UTF-8");        System.out.println (KeyWord); System.out.println ("\ nthe decoding using the GBK character set:"); System.out.println (Urldecoder.decode ("%e5%a4%a9%e6%b4%a5%e5%a4%a7%e5%ad%a6+rico","GBK"));//Convert ordinary string to application/x-www-form-urlencoded stringSystem.out.println ("\ n using the Utf-8 character set:"); String urlstr = Urlencoder.encode ("Tianjin University","Utf-8");        System.out.println (URLSTR); System.out.println ("\ n using the GBK character set:"); String urlStr2 = Urlencoder.encode ("Tianjin University","GBK");    System.out.println (URLSTR2); }}/ * Output: Decoding with UTF-8 character set: Tianjin University Rico uses the GBK character set for decoding: 澶 ╂ touch 澶 Уrico with utf-8 character set:% E5%a4%a9%e6%b4%a5%e5%a4%a7%e5%ad%a6 with GBK character set:%cc%ec%bd%f2%b4%f3%d1%a7 *///:~

In particular, ordinary strings and application/x-www-form-urlencoded mime strings that contain only Western European characters do not have to be converted, whereas ordinary strings containing Chinese character literals need to be converted by converting each Chinese characters to 2 bytes. Each byte can be converted to 2 hexadecimal digits, so each Chinese character is converted to the form "%xx%xx". of course, with different character sets, the number of bytes per Chinese character is not exactly the same, so you need to specify a character set when converting with Urlencoder and Urldecoder. In particular, strings should be encoded and decoded in the same character set, which would otherwise produce unexpected results , as shown in the program example above.

Two. Solve the GET request Chinese garbled problem

One application scenario for Urldecoder is to solve the Chinese garbled problem of get requests, as shown in the following code:

<% @page import="Java.net.URLDecoder"%><%@ page language="java" import="java.util.*" pageencoding= "UTF-8"%><html><head>    <title>Test</title></head><body>    <% String param1 = request. getquerystring ();        String param2 = Urldecoder.decode (param1, "Utf-8"); Out.print (param2.     Split("=") [1] + "<br>"); %></body></html>

It is important to note that when,> uses this method to decode GET request parameters, we must first decode the return value of the Request.getquerystring method (for example, "Name= Capricorn Blowing Snow") and then extract the parameter values we need. If we first take out the parameter values and then decode the parameter values, we will get garbled, as shown in:

            

In addition, for a POST request parameter that contains Chinese characters, we only need to transcode it with the following code before getting the request parameters:

    request.setCharacterEncoding("utf-8");
Three. Urlencoder & Urldecoder

When encoding String, use the following rules:

    • Letters, numbers and characters, "a" to "Z", "a" to "Z" and "0" to "9" remain unchanged;
    • Special characters ".", "-", "*" and "_" remain unchanged;
    • The space character "" is converted to a plus sign "+".

      In addition, all other characters are unsafe. It is therefore necessary to use some encoding mechanisms to convert them to one or more bytes, each byte represented by a 3-character string "%xy", where XY is the two-bit hexadecimal representation of the byte, and the recommended encoding mechanism is UTF-8. For example, using the UTF-8 encoding mechanism, the string "The Stringü@foo-bar" will be converted to "The+string+%c3%bc%40foo-bar" because in UTF-8, the character U is encoded as two bytes, C3 (hex) and BC (16 in Character @ is encoded as a byte 40 (hexadecimal).

      With regard to the use of the Urldecoder class, the conversion process is exactly the opposite of the process used by the Urlencoder class.

About JSP Chinese garbled more introduction, including page garbled, parameter garbled, form garbled, source file garbled knowledge, see my other two blog: "JSP Chinese garbled Problem ultimate solution (above)" and "JSP Chinese garbled problem the ultimate solution".

references

Use Urldecoder and Urlencoder to process Chinese

Encode and decode Chinese characters using Urldecoder and Urlencoder

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.