The solution of JSP Chinese garbled problem

Source: Internet
Author: User
Tags character set contains html form require tomcat linux
js| Solution | problem | chinese | Chinese garbled

Character Inner Code
Each country (or region) prescribes a set of character codes for the exchange of computer information, such as extended ASCII in the United States, Chinese gb2312-80, JIS of Japan, etc., which is the basis for information processing in the country (region), and has the important role of unified coding. Because of the overlapping of local character set codes and the difficulty of exchanging information, the software localized version has higher cost of independent maintenance. Therefore, it is necessary to extract the commonality of the localization work, do consistency processing, and minimize the special localization processing, which is called internationalization (i18n). Various language information is normalized to local information, while the underlying character set uses Unicode that contains all the characters.

Character Inner code (character code) refers to the inner code used to represent a character. We use the inner code when entering and storing documents, and the inner code is divided into Single-byte and double-byte inner code. The English full name of a single-byte inner code is the Single-byte Character Sets (SBCS), which can support 256 character encodings; the English full name of Double-byte inner code is double-byte Character Sets (DBCS), 65,000 character encodings can be supported, primarily for encoding the eastern text of a large character set.

CodePage refers to a selected list of characters in a particular order, in the early Single-byte language, the codepage order allows the system to follow this list to give a corresponding inner code according to the input value of the keyboard. For Double-byte internal code, the corresponding table multibyte to Unicode is given, so that the characters stored in Unicode can be converted to the corresponding character inner code. The introduction of support for codepage is primarily intended to access multilingual filenames, which are currently used on file systems under NTFS and FAT32/VFAT, which require the system to dynamically convert the file names to their corresponding language encodings when they are read.

I believe that the reader to understand the JSP code iso8859-1 must be familiar with, iso8859-1 is that we usually use more than a codepage, it belongs to the Western European language family. GB2312-80 was developed at the initial stage of the development of Chinese computer character information technology, which included most commonly used secondary characters and 9-area symbols. The character set is the Chinese character set supported by almost all Chinese systems and internationalized software, which is also the most basic Chinese character set.

GBK is an extension of gb2312-80 and is up-compatible. It contains 20,902 Chinese characters, its coding range is 0x8140~0xfefe, the character of high 0x80 is eliminated, all of its characters can be mapped to Unicode 2.0 one-to-one, which means that Java actually provides support for the GBK character set.

>gb18030-2000 (GBK2K) further expands the Chinese characters on the basis of GBK, and increases the characters of Tibetan and Mongolian minorities. GBK2K fundamentally solves the problem of insufficient character and short shape.


The difference between different development platforms
1. Tomcat 4 Development Platform

There are Chinese problems with more than 4 versions of Tomcat under Windows 98/2000 (and no problem in Linux and Tomcat 3.x), the main performance is the page display garbled. In IE, adjust the character set to GB2312, you can display the normal.

To solve this problem, you can add <%@ page language= "Java contenttype=" text/html at the beginning of each JSP page; charset=gb2312 "%>. However, this is not enough, although the display of Chinese, but found from the database read out of the field into garbled. After analysis found: In the database to save the Chinese character is normal, the database with the Iso8859-1 character set access to data, while the Java program in the processing of characters by default to use the Unified iso8859-1 character set (this also embodies the Java internationalization of ideas), So when the data is added, both Java and the database are handled in a iso8859-1 manner, so that there is no error. But when reading data, there is a problem, because the data read also uses the iso8859-1 character set, and the JSP file header has the statement <%@ page language= "Java" contenttype= "text/html;" charset=gb2312 "%>, which indicates that the page is displayed in the GB2312 character set, which is not the same as the read data. At this time the page shows the characters read from the database is garbled, the solution is to these characters transcoding, from Iso8859-1 to GB2312, you can normally display. This solution is versatile for many platforms and can be used flexibly by readers.

2. Tomcat 3.x, resin and Linux platform

In Tomcat 3.x, resin or Linux, there is no added statement <%@ page language= "Java contenttype=" text/html; charset=gb2312 "%>, while the pages of the <meta http-equiv=" Content-type "content=" text/html; The charset=gb2312 "> statement works, and can now be displayed normally. Conversely, if you add <%@ page language= "java" contenttype= "text/html;" charset=gb2312 "%> system will make an error, indicating that the engine of Tomcat version 4 is different when processing JSP.

In addition, the choice of character sets is important for different databases, such as SQL Server,oracle,mysql,sybase. If you consider a multilingual version, the character set of the database should be unified using ISO8859-1, which requires the output to be converted between different character sets.

Here's a summary of the different platforms:

(1) JSWDK is only suitable for general development, stability and other problems may be inferior to commercial software. Since JDK 1.3 performance is much better than JDK 1.2.2, and support for Chinese is better, it should be used as much as possible.

(2) as a free commercial software, resin not only fast, stable, automatic compilation, but also can point out the error line, and can support the use of JavaScript on the server side, but also good support for Chinese.

(3) Tomcat is only a JSP 1.1, Servlet 2.2 Standard implementation, we should not require this free software in detail and performance of all aspects, it mainly consider English users, this is why not to do special conversion, Chinese characters with the URL method to pass the cause of the problem. Most IE browsers default to send UTF-8, which seems to be a lack of tomcat, in addition to Tomcat, regardless of the current operating system is what language, all press ISO8859 to compile JSP, seems to be defective.


Chinese processing of JSP code
In the JSP code, the following frequently need to involve Chinese processing:

1. Attach Chinese parameters to the URL. Here Chinese parameters can usually be read directly, for example: <%= request.getparameter ("Showword")%>

2. Read the Chinese value of the HTML form submission in the JSWDK. This needs to be encoded, the more concise writing is:

String Name1=new string (Request.getparameter ("user_id"). GetBytes ("Iso8859_1").

In addition, with the support of JDK 1.3, you do not need to join the <%@ page contenttype= "text/html;charset=gb2312"%>, but under JDK 1.2.2, even if the above two methods are used at the same time is very unstable. But in the resin platform, the situation is better, as long as in the first line of the page to add: <%@ page contenttype= "text/html;charset=gb2312"%> can correctly handle Chinese, if you add code is wrong.

3. In JSWDK the session contains Chinese, if the value read out from the form is encoded to display correctly, but the Chinese value is not directly given, and the resin platform is very good.

4. Add code options when compiling the servlet and JSP. Use java-encoding iso8859-1 Myservlet.java when compiling the servlet; in the JSP zone configuration file, modify the compilation parameters to: Compiler=builtin-javac-encoding Iso8859-1. After using this method, you will not need to make any other changes to the normal display of Chinese.

In addition, the popular relational database system supports database encoding, which means that when you create a database, you can specify its own character set settings, and the database data is stored in the specified encoding form. When an application accesses data, there are Encoding conversions at both the entrance and exit points. For Chinese data, the database character encoding should be set to ensure the integrity of the data. GB2312, GBK, UTF-8 are optional database Encoding, you can also choose Iso8859-1 (8-bit), but increase the complexity of the programming, ISO8859-1 is not the recommended database Encoding. In Jsp/servlet programming, you can use the management function provided by the database management system to check whether the Chinese data is correct.


Processing Method Instances
Here are two specific examples of Chinese garbled solution, the reader will be careful study may be harvested.

1. Common method of character conversion

Transfer the values in the form to the database and then remove them all into "? ”。 Form uses the post to submit the data, and the statement is used in the code: String st=new (Request.getparameter ("name"). GetBytes ("Iso8859_1")), and charset=gb2312 is also declared.

To handle the Chinese parameters passed in form, you should add the following code to the JSP, define a GETSTR class that specifically addresses the problem, and then convert to the received parameters:
String keyword1=request.getparameter ("Keyword1");
Keyword1=getstr (KEYWORD1);
This will solve the problem, the code is as follows:
<%@ page contenttype= "text/html;charset=gb2312"%>
<%!
public string Getstr (String str) {
Try{string Temp_p=str;
Byte[] Temp_t=temp_p.getbytes ("iso8859-1");
String Temp=new string (temp_t);
return temp;
}
catch (Exception e) {}
return "NULL";
}
%>
<%--http://www.cndes.com Test--%>
<% String keyword= "Creative Network Technology Center welcomes your arrival";
String keyword1=request.getparameter ("Keyword1");
Keyword1=getstr (KEYWORD1);
Out.print (keyword);
Out.print (KEYWORD1);
%>

2. The character conversion of JDBC driver

Most JDBC driver currently use the local encoding format to transmit Chinese characters, such as the Chinese character "0x4175" to be transferred to "0x41" and "0x75". It is therefore necessary to convert the characters returned by the JDBC driver and the characters to be sent to the JDBC driver. When inserting data into a database with JDBC driver, Unicode must first be converted to native code; When JDBC driver queries data from a database, you need to convert native code to Unicode. The implementation of these two transformations is given below:
String Native2unicode (string s) {
if (s = = NULL | | s.length () = 0) {
return null;
}
byte[] buffer = new byte[s.length ()];
for (int i = 0; I s.length (); i++) {if (S.charat (i) >= 0x100) {
c = S.charat (i);
byte []buf = ("" +c). GetBytes ();
Buffer[j++] = (char) buf[0];
Buffer[j++] = (char) buf[1];
}
else {buffer[j++] = s.ch




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.