Java Web Background through the Request.getparameter () method to get the data of Chinese garbled __java

Source: Internet
Author: User
Tags java web
Problem Description: In the "Consolidated DataTables to Javaweb (SSH) Case summary analysis", found in the page submitted in Chinese data, in the background will appear Chinese garbled (example link: http://note.youdao.com/share/?id= 64c0dbe0bb90ebc9b6522a37cf45a834&type=note)
WORKAROUND: Convert the received data into a word after it is encoded into UTF-8. The concrete realization is as follows: Searchvalue = Request.getparameter ("search[value]");
Searchvalue = new String (searchvalue.getbytes ("iso-8859-1"), "Utf-8");

Summary Finishing
When submitting JSP for garbled problem, need to understand the reason of garbled.
Observe JSP header file: <%@ page language= "java" import= "java.util.*" contenttype= "text/html"; Charset=utf-8 "%>
In the header file, there is also an attribute associated with the encoding: pageencoding
-----------------------------------------------------------------------------------------
First, talk about the role of several encodings in Jsp/servlet.
In Jsp/servlet, there are several places where coding can be set up: pageencoding= "UTF-8" contenttype= "Text/html;charset=utf-8" Request.setcharactere Ncoding ("UTF-8") response.setcharacterencoding ("UTF-8") the first two are available only in JSPs, and the last two can be used in JSPs and servlet.
-----------------------------------------------------------------------------------------
The role of request.setcharacterencoding ("UTF-8") is to set the client request for recoding encoding on the server side.
This method is used to specify the encoding used when the data sent by the browser is to be encoded (or decoded).
The role of response.setcharacterencoding ("UTF-8") is to specify the encoding that is encoded in the client to encode the server response.
The code is used when the server is recoding data before it is sent to the browser.
-----------------------------------------------------------------------------------------
First, how does the browser encode the data received and sent?
1. Browser accepts data:
The role of response.setcharacterencoding ("UTF-8") is to specify the encoding that encodes the server response. At the same time, the browser is also based on this parameter to the data it received to the recoding (or called decoding). So whether you set response.setcharacterencoding ("UTF-8") or response.setcharacterencoding ("GBK") in your JSP, the browser can display the Chinese correctly.
Readers can do an experiment, in the JSP set Response.setcharacterencoding ("UTF-8"), in IE to display the page, in IE's menu select "View (V)" à "encoding (D)" can be viewed in the " Unicode (UTF-8), and in the JSP set Response.setcharacterencoding ("GBK"), in IE to display the page, in IE's menu select "View (V)" à "encoding (D)" can be viewed in the " Simplified Chinese (GB2312) ".
2. The browser sends the data:
When the browser sends the data, the URL and the parameters are encoded, and the Chinese in the parameter, the browser also makes the response.setcharacterencoding parameter to encode the URL. Take Baidu and Google as an example, if you search for "Chinese characters" in Baidu, Baidu will encode it as "%ba%ba%d7%d6". Google in search of "Chinese characters", Google will encode it as "%e6%b1%89%e5%ad%97", this is because Baidu's response.setcharacterencoding parameters for GBK, and Google's response.setcharacterencoding parameter is UTF-8.
--------------------------------------------------------
Second, the server is to receive and send data, is how to encode the data
1. Server sends data
For sending data, the server encodes the data to be sent according to the response.setcharacterencoding-contenttype-pageencoding order of precedence.
2. Server receives data
For receiving data, there are three different situations. One is the data that the browser submits directly with the URL, and the other two are data submitted by the form's get and post methods.
Because a variety of Web servers handle these three different ways, let's take Tomcat5.0 as an example.
① data submitted for post in a form
As long as the JSP page set on the response.setcharacterencoding or contenttype or pageencoding as "Utf-8", in the acceptance of data in the Jsp/servlet will not appear in Chinese garbled problems.
② data submitted by the URL and the Get method in the form
It is not possible to simply set the request.setcharacterencoding parameter in the Jsp/servlet receiving the data.
Because in Tomcat5.0, the problem is resolved by using iso-8859-1 to encode (decode) the data submitted by the URL and the Get method in the form by default:
You should set the Usebodyencodingforuri or Uriencoding attribute in the Connector tab of Tomcat's profile server.xml. Where the uriencoding parameter specifies a uniform recoding (decoding) encoding of all get-mode requests, including data submitted by the URL and a get-mode submission in the form.
Where the Usebodyencodingforuri parameter indicates whether the data submitted by the URL is encoded with the request.setcharacterencoding parameter, and is false by default in the form of a get-in-form;
The difference between uriencoding and Usebodyencodingforuri is that uriencoding is a unified recoding (decoding) of the requested data for all get methods, And Usebodyencodingforuri is based on the requested page of the request.setcharacterencoding parameters of the data recoding (decoding), different pages can have different recoding (decoding) of the encoding. So for the data submitted by the URL and the data submitted by get in the form, you can modify the uriencoding parameter to encode the browser or modify Usebodyencodingforuri to true, and in the JSP page where the data is obtained The request.setcharacterencoding parameter is set to browser encoding.
----------------------------------------------------------
The following summary, to Tomcat5.0 as a Web server, how to prevent Chinese garbled.
1, for the same application, the best unified coding, recommended for UTF-8, of course, GBK also can.
2, correctly set the JSP pageencoding= "UTF-8"
3, in all Jsp/servlet set contenttype= "Text/html;charset=utf-8" or response.setcharacterencoding ("UTF-8"), This indirectly implements the settings for the browser encoding.
4. For a Get or URL request that is not a form submission, you can modify the default configuration for Tomcat, either by setting the Usebodyencodingforuri parameter to TRUE or by setting the uriencoding parameter to UTF-8 (potentially affecting other applications. So it is not recommended). Or, use the following method to process the data when it is received:
Request.getparameter ("UserID") to get the value of UserID
Request.getparameter ("UserID"). Trim () Remove this value from both sides of the space
Request.getparameter ("UserID"). Trim (). GetBytes ("iso-8859-1") encode this string in iso-8859-1 as a byte number of ancestors
New String (Request.getparameter ("UserID"). Trim (). GetBytes ("Iso-8859-1"), "Utf-8") converts the number of bytes in the string into a constructor that follows "Utf-8" Code to create a string object.
5. Using the Urlencoder method
Before you pass the argument, use:
Converts a string to application/x-www-form-urlencoded format using the specified encoding mechanism
String Username_encoder = Urlencoder.encode (username, "UTF-8");
The following parameters are displayed with:
Decodes the application/x-www-form-urlencoded string using the specified encoding mechanism
String Username_decoder = Urldecoder.decode (Request.getparameter ("username"), "UTF-8");
-----------------------------------------------------------------------------------------
What is "GBK". What is "Utf-8"?
One, the character distinction
GBK contains all Chinese characters;
UTF-8 contains the characters that all countries in the world need to use.
Second, the coding on the distinction between
GBK is the standard of compatible GB2312 after GB2312 on the basis of national standard (as if it is not national standard)
UTF-8 encoded text can be displayed in various countries that support the UTF8 character set browser.
For example, if it is UTF8 code, it can also display Chinese on foreigners ' English ie, without the need for them to download IE's Chinese language support package. Therefore, for the English more forums, using GBK each character occupies 2 bytes, and the use of UTF-8 English is only one byte.
Iii. use of the distinction
GBK is China's national code, versatility than UTF8, but UTF8 occupy a larger database than GBK, and is generally do the forum dz These procedures, corresponding components and plug-ins to support GBK corresponding development of a more comprehensive point, and then more convenient when DIY.
UTF8 is an international code, its versatility is better, foreigners can also browse the forum, and Chinese can be directly identified, if your forum to do more internationalization that must be used UTF8.

Add: UTF8 more traditional support than GBK have the advantage drop.
For DZ Forum, a lot of plug-ins are only supported GBK, if you need to install more Plug-ins forum or use GBK better, and to install fewer plug-ins and have a special user group forum with UTF8 better.
Therefore, if you do the forum is only a specific domestic circle in the use of GBK simple point, the basic plug-ins can be installed, but if your station has foreign market needs on the proposal UTF8,

In the development of Web Chinese web site, GBK and UTF-8 use more than two character sets, but they are different. Summarized below.
1. GBK's text encoding is expressed in two-byte notation, that is, both Chinese and English characters are expressed in two-byte notation, except for the distinction of Chinese, the highest digit is set to 1.
UTF-8 encoding is a multi-byte encoding used to solve international characters, which uses 8 bits (that is, one byte) in English and uses 24 bits (three bytes) in Chinese to encode. More forums for English characters use UTF-8 to save space.
2. GBK contains all Chinese characters, including simplified and traditional characters
UTF-8 contains the characters that all countries in the world need to use.
3. GBK is the standard of compatible GB2312 after the GB2312 on the basis of national standard (as if it is not national standard)
UTF-8 encoded text can be displayed in various countries that support the UTF8 character set browser.
For example, if it is UTF8 code, the foreigner in English IE can also display Chinese, and do not need them to download IE Chinese Language support package. Therefore, for the English more forums, using GBK each character occupies 2 bytes, and the use of UTF-8 English is only one byte.

Please note:
Although the UTF-8 version has good international compatibility, Chinese needs more than 50% of the database storage space than the GBK/BIG5 version, so it is not recommended for use only for users with special requirements for international compatibility.
To put it simply:
For more Chinese forums, it is appropriate to save database space with GBK encoding.
For more English forums, it is appropriate to use UTF-8 to save database space.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.