The reason of JSP garbled problem and its solution.

Source: Internet
Author: User
Tags locale dreamweaver websphere application server

http://blog.csdn.net/caoxiaohong/article/details/1781777

jsp/jdbc MySQL garbled problem
JSP request default is iso8859_1, so when processing Chinese, to display Chinese words, must be converted to GBK, the following string Str=new string ("name"). GetBytes ("Iso8859-1"), "GBK"); Out.println (str); So you can display Chinese. MYSQL operation in Chinese problem:
This depends on the default code MySQL, generally do not adjust the words for latin1 in fact and iso8859_1, so the operation of the time to deal with him, otherwise it will be garbled 1. Insert Chinese:
String sql2= "INSERT into Test (name) VALUES ('" +request.getparameter ("name") + "')";
Stmt.executeupdate (SQL2);
You can insert 2 without coding. Displays the inserted Chinese:
Because the deposit is Latin, so when the display will be GBK
String X=new string ((rs.getstring ("title")). GetBytes ("Iso8859_1"), "GBK");
OUT.PRINTLN (x); 3. Set the storage encoding:
Of course, when MySQL is latin1 encoded, it can be saved with GBK.
Connection con=drivermanager.getconnection ("jdbc:mysql://localhost:3306/jsp?
Useunicode=true&characterencoding=gbk "," Root "," ");
str1= "Chinese";
String sql2= "INSERT into Test (name) VALUES ('" +str1+ "')";
This can also be very successful insertion, hehe Jsp/servlet in the Chinese character coding problem online on the Jsp/servlet DBCS character encoding problem there are many excellent articles and discussions, this article to do some collation, and combined with IBM WebSphere application Se RVer 3.5 (WAS) is a solution to some of the explanations, I hope it is not superfluous. 1. The origin of the problem each country (or region) specifies a set of character encodings for the exchange of computer information, such as ASCII in the United States, gb2312-80 in China, JIS in Japan, etc., as the basis of information processing in the country/region, with the important role of uniform coding. The character encoding set is divided into SBCs (single-byte character set) and DBCS (double-byte character set) by length. Early software (especially the operating system), in order to solve the local character information computer processing, the emergence of a variety of localized versions (L10N), in order to differentiate, introduced the concept of lang,codepage. However, due to the overlapping of the local character set code, it is difficult to exchange information with each other, and the software has higher independent maintenance cost for each localized version. Therefore, it is necessary to extract the commonality in the localization work, and to make a consistent processing, so that the special localization processing content is minimized. This is also called internationalization (i18n). Various language information is further regulated as locale information.   The underlying character set for processing becomes Unicode, which contains almost all glyphs. Most of the software core character processing with internationalized features is now based on Unicode, which determines the local character encoding settings based on the Locale/lang/codepage settings at the time of the software operation and handles local characters accordingly. The conversion between Unicode and local character sets is required during processing, or even two different local character sets in the middle of Unicode.   This approach is further extended in the network environment, and the character information on either side of the network needs to be converted to acceptable content based on the settings of the character set. Within the Java language, Unicode is used to represent characters, and Unicode V2.0 are respected. A Java program can convert a character encoding either from/to the file system to read/write a file in a character stream, to write HTML information to a URL connection, or to read a parameter value from a URL connection.   This, while increasing the complexity of programming, is prone to confusion, but it is in line with the idea of internationalization. In theory, these character conversions based on character set settings should not cause too many problems. The fact is that because of the actual operating environment of the application, the addition and refinement of Unicode and individual local character sets, as well as the non-specification of system or application implementations, problems with transcoding often plague programmers and users. 2.gb2312-80,gbk,gb18030-2000 Chinese Character Set the method of solving Chinese character coding problem in JAVA program is often very simple, but understanding the reason behind it, locating the problem, still need
To understand the existing Chinese character coding and encoding transformations. GB2312-80 was developed at the initial stage of the development of Chinese computer character information technology, which contains most commonly used secondary characters, and 9-zone symbols. This character set is the Chinese character set supported by almost all Chinese system and internationalized software, which is also the most basic Chinese character set. Its coding range is high 0xa1-0xfe, low is also 0xa1-0xfe, Chinese characters start from 0xb0a1, end in 0xf7fe; GBK is an extension of gb2312-80 and is upward compatible. It contains 20,902 Chinese characters, and its encoding range is 0x8140-0xfefe, rejecting the bit of high 0x80. All of its characters can be mapped to Unicode2.0 one-to-one, meaning that Java actually provides support for the GBK character set. This is the default character set for Windows and some other Chinese operating systems, but not all internationalized software supports the character set, and it feels like they don't fully know what's going on with GBK. It is important to note that it is not a national standard, but a norm.   With the release of gb18030-2000 GB, it will complete its historical mission in the near future. gb18030-2000 (GBK2K) further expands the Chinese characters on the basis of GBK, and adds the glyphs of Tibetan and Mongolian minorities. GBK2K fundamentally solves the problem that the word bit is not enough and the glyph is insufficient. It has several characteristics: it does not determine all the glyphs, only the coding range, to be extended later.
The encoding is long, and the second byte part is compatible with GBK; The four-byte section is an expanded glyph, Word bit, whose encoding range is first byte 0x81-0xfe, two bytes 0x30-0x39, three bytes 0x81-0xfe, four bytes 0x30-0x39.
Its generalization is phased, and first requires that all glyphs that are fully mapped to the Unicode 3.0 standard be implemented.
It is a national standard and is mandatory.
Now there is no operating system or software to achieve GBK2K support, this is the current stage and the future of the work of the Chinese. 3.jsp/servlet Chinese character coding problem and solution in was 3.1 common encoding problems the Jsp/servlet encoding problem that often occurs on the web is generally manifested in browser or the application side, such as:
How did the Chinese characters in the Jsp/servlet page seen in the browser become '? '?
How are the Chinese characters in the Servlet pages that are seen in the browser garbled?
How do Chinese characters in the JAVA application interface become squares?
Jsp/servlet Page Cannot display GBK kanji.
Jsp/servlet cannot receive the Chinese characters submitted by the form.
Jsp/servlet database read/write failed to get the correct content.
Hidden behind these problems are the various wrong character conversions and processing (except for the 3rd, due to Javafont setup errors).   To solve similar character encoding problems, you need to understand the jsp/servlet running process and examine the various points that may be problematic. 3.2 Encoding issues when jsp/servlet Web programming
Jsp/servlet, which runs on the Java application Server, provides HTML content for Browser
  Where there are character encoding conversions: a.jsp compilation. The Java application Server will read the JSP source file according to the JVM's file.encoding value and convert it to the internal character encoding for JSP compilation, generate the Java source file, and write back to the file system based on the file.encoding value. If the current system language supports GBK, then there is no encoding problem at this time. If the system is in English, such as Lang is en_US's Linux,aix or Solaris, the JVM's file.encoding value is set to GBK. If the system language is GB2312, if necessary, determine if you want to set file.encoding, set file.encoding to GBK to resolve the potential GBK character garbled problem B. Java needs to be compiled into a. class to execute in the JVM, a process that has the same file.encoding problem as a.   From here the servlet and JSP run like this, except that the servlet's compilation is not automatic. C.servlet needs to convert the contents of the HTML page to browser acceptable encoding content to send out. Depending on the implementation of each javaappserver, some will query the browser accept-charset and Accept-language parameters or other guessing methods to determine the encoding value, and some do not care. So constant-encoding may be the best solution. For Chinese web pages, you can set contenttype= "text/html;charset=gb2312" in a JSP or servlet, or contenttype= "text/html if there are GBK characters in the page;   CHARSET=GBK ", since IE and Netscape have different levels of support for GBK, it is necessary to test this setup. Because the 16-bit javachar is discarded when the network is transmitted, and to ensure that the characters in the servlet page (including embedded and servlet) are expected inside the code, you can use Printwriterout=res.getwriter () Instead of Servletoutputstreamout=res.getoutputstream (), Printerwriter will be converted according to the charset specified in ContentType (contenttype need to be specified before this!). ), or you can use the OutputStreamWriter package SerThe Vletoutputstream class uses write (string) to output the Chinese character string.   For Jsp,java application Server, you should be able to ensure that embedded kanji are correctly routed at this stage. D. This is the URL character encoding issue. If the value returned from browser is included in the Get/post method, the servlet will not be able to get the correct value for the Chinese character information. In Sun's j2sdk, Httputils.parsename does not consider the browser language setting at all when parsing parameters, but resolves the resulting values in byte. This is the most discussed encoding problem on the Internet. Because this is a design flaw, the resulting string can only be re-parsed in bin mode, or in the Hackhttputils class.   Refer to article 2, 3 are introduced, but it is best to the Chinese encoding GB2312, CP1381 are changed to GBK, otherwise encountered GBK kanji, there will be problems. ServletAPI2.3 provides a new function httpserveletrequest.setcharacterencoding used to specify what the application wants before calling Request.getparameter ("Param_name")   Encoding, this will help to solve the problem completely. WebSphere Application Server extends the standard Servlet API 2.x to provide better multi-lingual support. The above c,d situation, was all to query Browser language settings, in the default condition zh, ZH-CN, etc. are mapped to JAVA encoding CP1381 (Note: CP1381 is equivalent to GB2312 a codepage, there is no GBK support). I think it's because I can't confirm that the operating system that Browser is running supports GB2312 or GBK, so take it small. But the actual application
The system still asks the page to appear GBK Chinese characters, the most famous is the "?" in Premier Zhu's name. (Rong2, 0xe946,/u9555), it is sometimes necessary to designate Encoding/charset as GBK. Of course the change of the default encoding in was is not as troublesome as mentioned above, for a, B, refer to article 5), in the Application Server command line parameters specified-DFILE.ENCODING=GBK can be; Specify-DDEFAULT.CLIENT.ENCODING=GBK in the command line arguments of the applicationserver.   If-DDEFAULT.CLIENT.ENCODING=GBK is specified, CharSet can no longer be specified in the case of C. 3.3 Encoding problems in database reading and writing jsp/servlet the encoding problem in programming is another place where data is read and written in the database. The popular relational database system supports database encoding, which means that its own character set settings can be specified when the database is created, and that the database data is stored in the specified encoding format. When an application accesses data, there is a encoding conversion at both the entrance and exit. For Chinese data, the integrity of the data should be ensured. Gb2312,gbk,utf-8, etc. are optional database encoding, if ISO8859-1 (8-BITSBCS) is selected, then the application must split 16Bit of a Chinese character or Unicode into two 8-bit characters before writing the data. After reading the data, it is necessary to combine two bytes, and also to discriminate the SBCS characters. Instead of taking full advantage of database encoding, the complexity of programming is increased, iso8859-1 is not the recommended data
Library encoding.   Jsp/servlet programming, you can first use the functions provided by the database management system to check whether the Chinese data is correct. It should then be noted that the Encoding,java program of the data being read is generally Unicode.   The opposite is when writing data. 3.4 Common techniques for locating problems locating Chinese encoding problems are usually the stupidest and most effective way to print the inner code of a string after you think a suspect program has been processed. By printing the inner code of the string, you can find out when the Chinese characters are converted to Unicode, when the Unicode is returned to the Chinese code, when the text is two Unicode characters, when the string is translated into a string of question marks,   When is the high of the Chinese string truncated ... Taking the appropriate sample string also helps to differentiate between types of problems. such as: "AA ah Aa?aa" and other Chinese and English, GB, GBK character strings. In general, no matter how the English characters are converted or processed, it will not be distorted (if encountered, you can try to increase the length of consecutive English letters).

1 The most basic garbled problem.

This garbled problem is the simplest garbled problem. General Xinhui appears. is the page encoding inconsistency caused by garbled.

<%@ page language= "java" pageencoding= "UTF-8"%>

<%@ page contenttype= "Text/html;charset=iso8859-1"%>

<title> Chinese issues </title>

<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">

<body>

I'm a good man.

</body>

Three places of code.

The first place in the encoding format is the storage format of the JSP file. Eclipse will save the file based on this encoded format. and compile the JSP file, including the Chinese characters inside.

The second encoding is the decoding format. Because the file saved as UTF-8 is decoded to iso8859-1, so the Chinese must be garbled. That must be the same. And the second place in this line, can not. The default is also the encoding format using ISO8859-1. So if there is no such a line, "I am a good person" will also appear garbled. Must be consistent.

The third code is to control how the browser is decoded. This encoding format is not related if the previous decoding is consistent and error-free. Some pages are garbled because the browser cannot determine which encoding format to use. Because pages are sometimes embedded in the page, the browser confuses the encoding format. There was garbled characters.

2 The garbled problem received when the form was submitted using post

This problem is also a common problem. This garbled is also tomcat internal encoding format iso8859-1 in trouble, that is, when the post submission, if not set the encoding format of the submission, it will be submitted in iso8859-1 manner, the accepted JSP is Utf-8 accepted. causes garbled characters. Since this is the reason, there are several workarounds and comparisons.

A encoding conversion when a parameter is accepted

String str = new String (Request.getparameter ("something"). GetBytes ("Iso-8859-1"), "Utf-8"); In this case, each parameter must be transcoded in this way. Very troublesome. But you can actually get the kanji.

B at the beginning of the request page, execute the requested encoding code, request.setcharacterencoding ("UTF-8"), and set the character set of the submission to UTF-8. In this case, the page that accepts this parameter does not have to be transcoded. Direct use

String str = request.getparameter ("Something"), the Chinese character parameter can be obtained. But every page needs to execute this sentence. This method is also effective for post submissions, which is not valid for enctype= "Multipart/form-data" when a get commits and uploads a file. The following is a separate description of the two garbled cases later.

C to avoid writing request.setcharacterencoding ("UTF-8") on every page, we recommend using filters for all JSPs

for encoding processing. There are many examples of this online. Please check them yourself.

3 How the form get submitted is garbled.

If you use get to submit the Chinese language, the page that accepts parameters will also appear garbled, this garbled reason is also tomcat internal encoding format iso8859-1 caused. Tomcat will encode the kanji with the default encoding of GET, append to the URL after encoding, and result in the iso8859-1 of the parameters on the receiving page.

Workaround:

A using the first method in the previous example, the accepted characters are decoded and then transcoded.

B get goes for URL commits, and iso8859-1 is encoded before entering the URL. To affect this encoding you need to add usebodyencodingforuri= "true" to the connector node of the Server.xml

Property configuration, you can control how Tomcat Chinese character coding The Get method, which controls that the get commits are encoded using the encoding format set by request.setcharacterencoding ("UTF-8"). So automatically encoded as utf-8, accept the page to accept the normal. But I think the real coding process is that Tomcat is also based on

<connector port= "8080"

maxthreads= "minsparethreads=" maxsparethreads= "75"

Enablelookups= "false" redirectport= "8443" acceptcount= "100"

debug= "0" connectiontimeout= "20000" usebodyencodingforuri= "true"

Disableuploadtimeout= "true" uriencoding= "UTF-8"/>

The uriencoding= "UTF-8", which is set inside, is encoded again, but the encoding is not changed because it is encoded as utf-8. If the encoding is obtained from the URL, the Accept page is decoded according to uriencoding= "UTF-8".

4 garbled solution when uploading a file

When uploading a file, the form form is set to Enctype= "Multipart/form-data". This way, the file is submitted in a streaming manner. If you use the Apach upload component, you will find a lot of garbled imagination. This is because the Apach of the early Commons-fileupload.jar has a bug, take out the Chinese characters after decoding, because this way to commit, encoding and automatically use the Tomcat default encoding format iso-8859-1. But the garbled problem is: period, comma, and other special symbols become garbled, if the number of Chinese characters is odd, it will appear garbled, even the analytic normal.

Workaround: Download Commons-fileupload-1.1.1.jar This version of the jar has been resolved by these bugs.

However, you still need to transcode the extracted characters from iso8859-1 to Utf-8 when you remove the content. have been able to get all the characters as well as normal.

5 Java code about URL request, accept parameter garbled

The encoding format of the URL depends on the uriencoding= "UTF-8" described above. If this encoding format is set, it means that all Chinese character parameters to the URL must be encoded. Otherwise the obtained Chinese character parameter value is garbled, for example

A link response.sendderect ("/a.jsp?name= Zhang Dawi"), and in a.jsp directly use

String name "), the resulting is garbled. Because the rules must be utf-8, so this turn should be written like this:

Response.sendderect ("/a.jsp?name=urlencode.encode" ("Zhang Dawi", "utf-8");

What happens if you do not set this parameter uriencoding= "UTF-8"? If not set, the default encoding format iso8859-1 is used. The problem comes out again, the first is the number of parameter values if it is an odd number of numbers, it can be parsed normally, if you make an even number of numbers, get the last character is garbled. There is also if the last character if it is in English, it will be able to parse normally, but Chinese punctuation is still garbled. Expedient, if your parameters do not have Chinese punctuation, you can add an English symbol at the end of the parameter value to solve the garbled problem, get the parameters and then remove the most behind the symbol. can also be pooled or used.

6 script code about URL request, accepted parameter garbled

The script also controls the page steering, as well as the accompanying parameters and the case where the page is parsed for the parameter. If this Chinese character parameter does not carry on the uriencoding= "UTF-8" the encoding processing, then accepts the page to accept the Chinese character also garbled. The script processing code is troublesome, must have the corresponding encoding script corresponding file, then calls the script the method to encode the Chinese character.

7 garbled questions about JSP opening in MyEclipse

For an already existing project, the storage format of the JSP file may be utf-8. If the new eclipse is installed, the encoding format used by default is iso8859-1. So the JSP inside the Chinese characters appear garbled. This garbled is easier to solve, go directly to eclipse3.1 preferences inside find General-〉edidor, set to your file open code for Utf-8 can. Eclipse will automatically re-open in the new encoded format. Chinese characters can be displayed normally.

8 about HTML pages open in Eclipse garbled condition

Since most of the pages are made by Dreamweaver, their storage format differs from Eclipse's recognition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.