Parse and access a series of JSP encoding and decoding processes-Details

Source: Internet
Author: User
Parse and access a series of JSP encoding and decoding processes-Details

From: http://japi.javaeye.com/blog/288779


Garbled text is a headache...

Page garbled, Servlet garbled, database garbled ....

An annoying word ..

The solution came out very early... not much...

Let's take a look at two URLs:

Http://www.google.cn/search? Client = aff-CS-worldbrowser & forid = 1 & Ie = UTF-8 & OE = UTF-8 & HL = ZH-CN & Q = Open Source

Http://www.javaeye.com/search? Type = All & query = Paging

Click to see what effect... select the URL, and then press enter to see...

--------------------------------------------------------

Let's talk about a series of encoding and decoding processes from URL to servlet to page ..

First, we know that all characters in Java use Unicode

Secondly, we assume that UTF-8 encoding is used in the page. Of course, you can also use gb2312 and other... But we strongly recommend that you use UTF-8.

1. Enter the URL...

At this point, the servlet engine will do something ..

Encode the URL, convert it to Unicode according to the ISO8859-1 character set encoding, and then encapsulate it into the servletrequest object.

When we use a form, the post and get methods will encode the content in the form using the character encoding of the page. this process is a bit like urlencoder. effect of the encode () method...

2. Get the parameters in the URL ..

This is what we often do. It's just a method... request. getparameter ("paramname ")

There are some decoding work behind this method ..

The reason for garbled characters is sometimes here ..

The getparamter method is used to decode the URL. The servlet specification does not explicitly specify the character set encoding used for decoding. It is determined by each servlet engine manufacturer... the ISO8859-1 character set is used for URL Decoding by default in Tomcat.

For Post mode: You can use request. setcharacterencoding () to specify other decoding methods.

For the get method:

You can use the original method new string (Param. getbytes ("iso8859-1"), "UTF-8 ")

The purpose of this statement is as follows:

Returns the error Unicode decoded by the getparameter Method to the correct encoding.

What should I do? It is a bit difficult to explain here ..

When submitting a form in the page, the Form Content uses the page encoding code UTF-8 encoding, while the default getparameter method is to decoding the iso8859-1, so if you do not process, it will be garbled...

The conversion between ISO8859-1 and Unicode is lossless ..

We use getbytes ("iso8859-1") to restore to the correct character array, and then encoded in UTF-8 to get the correct results ..

This process is as follows:

(Encoded with UTF-8) getparameter decoded with (ios8859-1)

URL ---------------> character array ----------------------------> Unicode (error)



Use getbytes ("iso8859-1") to restore new string (bytes, "UTF-8 ")

Unicode ------------------------> character array ---------------> Unicode (correct )-

You can also modify server. xml.
Java code

  1. <Connector Port = "8080" protocol = "HTTP/1.1"
  2. Connectiontimeout = "20000"
  3. Redirectport = "8443" <span style = "color: red;"> uriencoding = "UTF-8 </span>"/>
 <Connector port="8080" protocol="HTTP/1.1"                connectionTimeout="20000"                redirectPort="8443" URIEncoding="utf-8"/>

UTF-8 is used for URL Decoding.



Do you remember to let us do the experiment above... select the URL and press enter? At this time, the Chinese characters in the URL are encoded with local characters. They are encoded with gb2312 ..

For example, in javaeye's. url, It is gb2312, and UTF-8 is used in background processing, so the garbled code... Google is displayed,
There is an IE parameter, which may be the browser encoding. If it is changed to gb2312, there will be no garbled code... in this regard, Google is better than Baidu... the page encoding will also be rooted.
According to this parameter, the Baidu is only processing gb2312... from this point of view, Baidu is still not going international... you can try again to verify ..

Enter the URL directly. The process is the same.

First encode the URL, encoded in ISO8859-1, getparameter decoding .. gb2312 ----> Unicode

3. The characters are displayed on the page.

In servlet, all are unicode encoded and will be encoded as a character array using request. setcharacter.

When it is displayed on the page, decode it using the contenttype attribute... all these two items must be consistent .!

----------------------

The above are some of the processes and some details...

For example, the setcharacterencoding method is not used in the servlet, and the corresponding filter is not used for processing,

If you have URL parameters, you must use urlencoder. encode () encoding. The URL on the page will be correctly displayed ..

Encoding and decoding are symmetric... the bottom layer uses byte Arrays for transmission. As long as we know how this byte array is obtained, we know how to handle it and what causes the garbled characters...


For example, enter the URL, the URL is gb2312, first URL encoding, gb2312 to ISO8859-1 ----> byte array .. getparameter and
Decodes byte arrays in ISO8859-1, at this time you write new
String (P. getbytes ("iso8859-1"), "UTF-8") is garbled...

Write P. getbytes ("iso8859-1"), "gb2312" is correct ..

When searching for pages, the Chinese parameters on the next page are usually displayed on the page for urlencoder. encode. If not, re-Decode it in the servlet .. that is, getbytes ..

1. If the encode is not displayed on the page, use getbytes In the servlet and recode it.

 
This is because the UTF-8 character ------ (ISO8859-1) --------> character array ------- getparameter in the URL
Iso8859-1 decoding ---------> Unicode, Here Unicode is incorrect, it is UTF-8 encoded.


2. encode (key, "UTF-8") can also be performed in the servlet when the encode is displayed on the page, but not in the servlet.
Request. setcharacter (), that is, in servlet, only the conversion between the iso8859-1 and Unicode is involved when the page is reached.
Ground character...

-------------------------------------------

Key Point: symmetric conversion... encoding type of the underlying byte array ..

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.