Learning memo: Java Chinese garbled solution Summary
Source: Internet
Author: User
The Java Chinese garbled solution AIO took an entire afternoon and read 12 related documents from several websites, including Sun, IBM, csdn, and javaeye, summarize the Chinese Garbled text and solutions that may be encountered in Java Development, and write down the notes. //********************
// 1. Possible garbled characters
// ********************* 1. the JSP page containing Chinese content is incorrectly displayed in the browser.
2. The servlet cannot correctly parse the Chinese content submitted in post mode.
3. The servlet cannot correctly parse the Chinese content submitted by the get method.
4. The Chinese content returned by servlet to response is incorrectly displayed.
5. The Chinese content is incorrect during database reading/writing.
6. The Chinese character displayed on the desktop application interface is incorrect. In the above cases, what are the Chinese characters that may be displayed? Or Square //********************
// II. Causes of garbled characters
// ********************* First, the root cause of this problem is the complexity of the Computer character encoding method and the inconsistency of standards. This problem is inevitable because of the different languages and texts in different parts of the world.
In fact, Java's file storage/compilation mechanism and JVM working mechanism adopt a unified unicode encoding method internally. If the decoding methods of the target platform are inconsistent, chinese garbled characters (in fact, not only Chinese characters, but also many/single character sets may cause problems, depending on the target platform ). the detailed analysis of character set encoding involves too much content and will not be repeated. Several in-depth documents can be found on official websites such as IBM and unicode.org. here, only the record solution is used. //************************
// 3. General principles for avoiding Garbled text
// ************************* There are some general principles that should be observed in the compilation process, you can avoid some elementary troubles: 1. JSP:
A. In the JSP file, the contenttype should be specified. The charset value must be the same as the character set used by the client browser.
B. For other string constants, no internal code conversion is required.
C. for string variables, you must restore them to byte streams that the client can recognize Based on the character set specified in contenttype, because the string variables are "based on the <JSP-charset> character set ". 2. servlet:
A. In servlet, charset should be set to be consistent with the client internal code through httpservletresponse. setcontenttype.
B. As a String constant, you must specify encoding as the character set of the platform for compiling source files during javac compilation. Generally, the Chinese environment is gb2312 or GBK.
C. string variables, similar to JSP, must be "based on the <servlet-charset> character set ". //************************************
// 4. specific solutions for various garbled issues
// *********************************** The solution assumes that the target platform is a Chinese platform, test environment: Operating System: Windows Vista 6.0 running on x86; GBK; zh_cn (NB), UBUNTU 8.04
Browser: IE 7, Firefox 3
Java environment: Java 2 SDK 1.6.0 _ 10-rc; Java hotspot (TM) Client VM 11.0-B15
Application Server: Apache Tomcat 6.0
Development Environment: netbeans ide 6.1 Type 1. "The JSP page containing Chinese content is incorrectly displayed in the browser"
-------------------------------------------------
Question: When a JSP file is opened in a browser, the Chinese content is displayed? And square. solution 1: Specify the encoding method in the. jsp file. Example:
<
% @ PageContenttype = "text/html; charset = UTF-8" pageencoding = "UTF-8" %>
<Meta http-equiv = Content-Type content = "text/html; charset = UTF-8"> note:
Specify the encoding method as a UTF-8
After modification, the Chinese characters are displayed normally in the browser. Type 2. The servlet cannot correctly parse the Chinese content submitted in post mode (also applies to. jsp embedded scripts )..
----------------------------------------------------------------------------
Problem description: When the servlet submits data to the servlet through form in post mode, the Chinese content in the servlet is incorrect. (The test proves that the obtained content is not correct in the output, but is incorrect. Problem 3 is the same ). solution 1: Specify the encoding method in the servlet. example:
Make the following call in the dopost () method.
Response. setcontenttype ("text/html; charset = UTF-8"); // set Content-Type
Response. setcharacterencoding ("UTF-8"); // sets the response data encoding format (output)
Request. setcharacterencoding ("UTF-8"); // sets the request data encoding format (input), which is the key description:
We usually use printwriter out = response. getwriter (); to obtain the out object. If so, this call must be placed after the preceding call.
Then, call request. getparameter () to obtain the parameters without any conversion. solution 2: Filter to implement general encoding filtering. Example:
Create a filter, inherit from httpservlet, and implement the filter interface.
In the dofilter () method, encode and convert the intercepted request content.
The test code is modified according to Tomcat's filter-example. It is long and is not listed here. Description:
This method can be used if there are many modules involved in the project.
This filter must implement a specific class inherited from httpservletrequestwrapper. You can use the hashmap structure to overload getparametervalues () and other methods. solution 3: compile a separate Conversion Function (method). Description:
The principle of this method is the same as solution 2, the conversion function is very simple, only need to use getbytes ("ISO8859-1", "UTF-8") to convert.
This method is applicable to scenarios with few modules. Type 3. The servlet cannot correctly parse the Chinese content submitted by the get method (also applies to. jsp embedded scripts ).
--------------------------------------------------------------------------
Problem description: When the URL in the address bar is used to submit data to the servlet in get mode, the Chinese content in the servlet is incorrect. solution 1: See solution of Type 2. note: The get method may be invalid in different settings because it is related to the method in which the application server and the browser send URLs. You can try the following solution again. solution 2: Modify tomcat configuration. example:
Modify % atat_home %/CONF/server. xml
Add uriencoding = "GBK" to the <connector Port = "8080" protocol = "HTTP/1.1"... section, and restart tomcat to take effect. Note:
This method has been tested. but server. XML settings affect all applications managed by Tomcat. If an application with different codes exists, use this method with caution. type 4. the Chinese content returned by the servlet to response is incorrect.
-------------------------------------------------
Set the correct Content-Type by referring to solution 1 of Type 2. Type 5. The Chinese content is incorrect when reading/writing the database.
--------------------------------
The solutions for different databases are different, but the principles are the same: Specify the character set used by the database, perform encoding conversion during reading/writing, or use both. 5.1 MYSQL: the default encoding of MySQL is Latin1, which is the same as ISO8859-1 and needs to be converted when it involves Chinese data. 5.1.1. chinese characters inserted: (No encoding required)
String SQL = "insert into Table Name (field name) values ('" + request. getparameter ("field name") + "')";
Stmt.exe cuteupdate (SQL); 5.1.2. Display read Chinese characters:
Because Latin1 is saved, the display needs to be converted.
String STR = new string (Rs. getstring ("field name"). getbytes ("iso8859_1"), "UTF-8 ");
Out. println (STR); 5.1.3 directly sets the character set encoding method:
Connection conn = drivermanager. getconnection ("JDBC: mysql: // localhost: 3306/jsp? Useunicode = true & characterencoding = UTF-8 "," root ","");
STR = "Chinese content ";
String SQL = "insert into Table Name (field name) values ('" + STR + "')"; 5.2 other databases (not tested, not listed currently) 6. the Chinese display on the desktop application interface is incorrect.
---------------------------------------
Note:
Any location involving string display (constant, variable) should take into account the target platform of different encoding. resources related to the interface display can be extracted and placed into the resource file resourcebundle (listresourcebundle, propertyresourcebundle ).
When the program needs to process these resources, load the required local resources from resourcebundle.
Note: To correctly display Chinese characters, you must select a font for the components in the program. For example, you can use "simsun" when displaying Chinese characters ", otherwise, a box is displayed when the interface contains Chinese characters. (This section has not been tested) 7. garbled Characters During File Upload
--------------------
Problem description: The Chinese data received during file upload is incorrect. solution 1: Upgrade standard components. Description:
During the upload, the form is set to enctype = "multipart/form-data ".
The browser submits files in streaming mode. If you use the Apach Upload Component, garbled characters may occur. this is because Apache's early commons-fileupload.jar has a bug, take out the Chinese characters After decoding, because this method of submission, encoding and automatic use of Tomcat default encoding format ISO8859-1.
You can download the jar of the commons-fileupload-1.1.1.jar (or update version) and have fixed these bugs.
But the extracted characters still need to be transcoded from the ISO8859-1 to the UTF-8.
The self-compiled Upload Component can handle the encoding by referring to the preceding points .//********************
// Postscript
//********************
After an afternoon of searching and testing, I found that there were many garbled code problems in Java. the above lists the Problems and Solutions summarized today, it basically includes situations that may be encountered during daily development.
In addition, there are several rare garbled scenes. One is not easy to encounter, and the other is not actually tested, so they are not added for the moment. If they are typical, they will be updated to the document.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.
A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service