Since the touch of Java and JSP, the Java has been constantly dealing with the Chinese garbled problem, and now finally got a thorough solution, we are to resolve the experience and share with you.
First, the origin of Java Chinese problem
Java kernel and class files are based on Unicode, which makes the Java program has a good cross-platform, but also brings some trouble with Chinese garbled problems. There are two main reasons, Java and JSP file itself compile the garbled problem and Java program in other media interaction generated garbled problem.
First Java (including JSP) source files are likely to contain Chinese, and Java and JSP source files are saved based on the word throttling, if the Java and JSP compiled into the class file process, the use of encoding and the source file encoding inconsistent, will appear garbled. Based on this garbled, it is recommended in the Java file to try not to write Chinese (annotation part does not participate in compiling, write Chinese does not matter), if you must write, as far as possible manually with parameters-ecoding GBK or-ecoding gb2312 compiled; for JSP, In the file header plus or basically can solve this kind of garbled problem.
This article will focus on the second type of garbled, that is, Java programs and other storage media interaction generated garbled. Many storage vectors, such as databases, files, streams, and so on, are based on byte throttling, and when the Java program interacts with the media, the conversion between character (char) and byte (byte) occurs, as follows:
Submit data from page form to Java program Byte->char
From Java program to page display Char?>byte
From database to Java program Byte?>char
From Java program to database Char?>byte
From file to Java program Byte->char
From Java program to file Char->byte
From stream to Java program Byte->char
From Java program to stream char->byte
If the encoding used in the above conversion process is inconsistent with the original byte encoding, it is likely that garbled code will appear.
Ii. Solutions
The conversion process of characters and bytes when Java programs interact with other media is mentioned earlier, if they are easily garbled during conversion. The key to solve these garbled problems is to ensure that the encoding used in the conversion and byte of the original encoding way to maintain the same, the following separately discussed (Java or JSP itself generated garbled please see the first part).
1, JSP and page parameters between the garbled
JSP get page parameters generally use the system default encoding, if the page parameter encoding type and system default encoding type is inconsistent, it is likely to appear garbled. The basic way to solve this type of garbled problem is to force the specified request to get the parameter encoding before the page gets the parameter: request.setcharacterencoding ("GBK") or request.setcharacterencoding ( "gb2312").
If there are garbled characters when the JSP outputs the variable to the page, you can either set Response.setcontenttype ("TEXT/HTML;CHARSET=GBK") or Response.setcontenttype ("text/html ; charset=gb2312 ") resolved.
If you don't want to write two sentences in each file, the simpler approach is to use the filters in the servlet specification to specify the encoding, and the typical configuration and main code for the filter in Web.xml is as follows:
Xml:
Characterencodingfilter
Net.vschool.web.CharacterEncodingFilter
Encodinggbk
Characterencodingfilter
/*
Characterencodingfilter.java:
public class Characterencodingfilter implements Filter
{
protected String encoding = NULL;
public void init (Filterconfig filterconfig) throws Servletexception
{
this.encoding = Filterconfig.getinitparameter ("encoding");
}
public void Dofilter (ServletRequest request, servletresponse response, Filterchain chain) throws IOException, Servletexception
{
request.setcharacterencoding (encoding);
Response.setcontenttype ("text/html;charset=" +encoding);
Chain.dofilter (request, response);
}
}
2, Java and the database between the garbled
Most databases support Unicode encoding, so it is advisable to use Unicode encoding directly to interact with a database in order to solve the garbled problem between Java and the database. Many database drivers automatically support Unicode, such as Microsoft SQL Server drivers. Most of the other database drivers can be specified in the driver URL parameters, such as the MySQL driver for mm: jdbc:mysql://localhost/webcldb?useunicode=true&characterencoding= GBK.
3, Java and file/stream garbled
Java read and write files The most commonly used classes are fileinputstream/fileoutputstream and Filereader/filewriter. Where FileInputStream and FileOutputStream are based on byte throttling, often used to read and write binary files. Read-write character files recommend the use of character-based FileReader and FileWriter, eliminating the conversion between bytes and characters. However, the constructors of these two classes use the System encoding method by default, and if the contents of the file are inconsistent with the system encoding, garbled characters may appear. In this case, it is recommended that you use the parent classes of FileReader and FileWriter: Inputstreamreader/outputstreamwriter, which are also character-based, However, you can specify the encoding type in the constructor: InputStreamReader (InputStream in, Charset CS) and OutputStreamWriter (OutputStream out, Charset CS).
4, other
the above mentioned method should be able to solve most garbled problems, if there are garbled elsewhere, you may need to modify the code manually. The key to solving the Java garbled problem is that in byte-and-character conversion, you must know the encoding of the original byte or the converted byte, and the encoding used in the conversion must be consistent with the encoding. We used to use resin server, upload files using smartupload components, upload files at the same time pass the Chinese parameters to get no garbled problem. When the resin in Linux is set to service, upload files at the same time the Chinese parameters to obtain a garbled. This problem has plagued us for a long time, and then we analyze the source files of the Smartupload components, because the file upload is the way of the byte stream, which contains the parameters of the name and value is the way to pass the word stream. The Smartupload component reads the byte stream and then parses the parameter name and value from the stream of bytes, and the problem occurs when the system defaults to the smartupload when converting the bytecode to a string, and the system default encoding may change after the resin is set to service. So there was garbled. Later, we changed the source file for Smartupload, added a property charset and Setcharset (String) method to extract the parameter statement from the upload () method:
String value = new String (m_ BinArray, M_startdata, (m_enddata-m_startdata) + 1);
changed to
String value = new String (M_binarray, M_startdata, (m_enddata-m_startdata) + 1, charset);
Finally solved this garbled problem.
Three, PostScript
Contact Java and JSP has been more than a year, the biggest harvest this year is more and more like the Java, began to treat the problem as a pleasure to study, without the previous fear, I believe I will continue to adhere to. This year, from the Internet to learn a lot of peer's valuable experience, in this express thanks. This is my first summary of the Java learning experience, due to the limited level, this article in the biased and wrong place, welcome to correct. If you have some value, in the preservation of the author's information and the original source of the article can be reproduced everywhere.
Before writing this article, I have consulted a lot of articles about the Java Chinese problem, among which there are some solutions and experiences of the owen1944 in the "Java Research organization" published by our company. Wait The solution discussed in this paper has been applied to the "web-based Collaborative learning system-WEBCL" and so on, and through the resource binding to realize the instant handover of the two versions of the platform Chinese text. Google automatically chooses the language according to the browser, a page simultaneously displays the internationalization application of many languages and Che Dong "Java Chinese processing study notes??" Hello Unicode has aroused my great interest in the future, I would like to continue to explore the internationalization of Java issues, welcome to discuss together.