Jsp Chinese display garbled Solution

Last Update:2014-03-14 Source: Internet

Author: User

Tags dreamweaver websphere application server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. garbled characters on the JSP page
2. garbled characters appear when the form is submitted to Chinese
Iii. Database Connection

Chinese garbled characters often occur during JSP development, which may affect you, I am writing out the Chinese garbled characters I encountered during JSP development and the solutions for your reference.

I. garbled characters on the JSP page
The following display page (display. jsp) is garbled:

JSP Chinese Processing

<%
Out. print ("JSP Chinese processing ");
%>

Different WEB servers and JDK versions have different processing results. Cause: the encoding method used by the server is different from that used by the browser to display different characters. Solution: Specify the encoding method (gb2312) on the JSP page, that is, add <% @ page contentType = "text/html; charset = gb2312 "%> to eliminate garbled characters. The complete page is as follows:
:
<% @ Page contentType = "text/html; charset = gb2312" %>

JSP Chinese Processing

<%
Out. print ("JSP Chinese processing ");
%>

2. garbled characters appear when the form is submitted to Chinese
The following is a submission page (submit. jsp). The Code is as follows:

JSP Chinese Processing

The following is the process. jsp code:

<% @ Page contentType = "text/html; charset = gb2312" %>

JSP Chinese Processing

<% = Request. getParameter ("name") %>

If the English characters submitted by submit. jsp are correctly displayed, garbled characters will appear when you submit Chinese characters. Cause: by default, the browser uses the UTF-8 encoding method to send requests, while the UTF-8 and GB2312 encoding method to indicate the character is not the same, so there is an unrecognized character. Solution: requests are uniformly encoded using request. seCharacterEncoding ("gb2312") to display Chinese characters normally. The modified process. jsp code is as follows:

<% @ Page contentType = "text/html; charset = gb2312" %>
<%
Request. seCharacterEncoding ("gb2312 ");
%>

JSP Chinese Processing

<% = Request. getParameter ("name") %>

3. garbled database connection
As long as all Chinese characters are garbled, solution: Add useUnicode = true & characterEncoding = GBK to the Database URL.

4. garbled display of databases
In mysql4.1.0, Chinese garbled characters are displayed for the varchar and text types. Setting the varchar type as the binary Attribute can solve Chinese problems, for the text type, an encoding and conversion class must be used for processing. The implementation is as follows:

Public class Convert
{
/** Convert the ISO-8859-1 code to GB2312 */
Public static String ISOtoGB (String iso)
{
String gb;
Try
{
If (iso. equals ("") | iso = null)
{
Return "";
}
Else
{
Iso = iso. trim ();
Gb = new String (iso. getBytes ("ISO-8859-1"), "GB2312 ");
Return gb;
}
}
Catch (Exception e)
{
System. err. print ("encoding conversion error:" + e. getMessage ());
Return "";
}
}
}

By compiling it into a class, you can call the static method ISOtoGB () of the Convert class to Convert the encoding.

Summary:

1. in jsp, if <% @ page contentType = "text/html; charset = A" %> is specified, all constructed strings (not references) in jsp are modified ), if no encoding is specified, these strings are encoded as. If the String obtained from the request does not have the encoding of the request, it is the iso-8859-1, and the String obtained from other places is the original encoding, such as getting the String from the database, if the database encoding is B, the String encoding is B rather than A, and is not the default one. At this time, if the encoding of the String to be output is not A, it is likely to display garbled characters. Therefore, you must first convert the String to the String encoded as A and then output it.

2. in jsp, <% @ page contentType = "text/html; charset = A" %> is not specified, so <% @ page contentType = "text/html; charset = ISO-8859-1 "%>

3. in Servelte, if response. setContentType ("text/html; charset = A"); specify the encoding of the response character output stream to A. encode all strings to be output to, otherwise, garbled characters are obtained. The String obtained from the request in Servelet is the same as that in jsp, but the String constructed in the servletjava file is the default encoding of the system. In servelt, the String obtained from the outside uses the original encoding. For example, the data obtained from the database encoded as B is encoded as B, not, it is not the default encoding.

**************************************** ****

Reprinted: JSP Chinese garbled problem solution Summary
In the process of using JSP, Chinese garbled characters are the biggest headache. The following are some of the problems I encountered in software development.
Code problem and solution.

1. garbled JSP pages
The reason for this Garbled text is that the character set encoding is not specified on the page. Solution: Use the following code to specify the character set encoding at the beginning of the page,

2. Database garbled characters
This Garbled text will make Chinese characters you inserted into the database Garbled text or Garbled text when reading the display. The solution is as follows:
Add the encoding character set to the database connection string
String Url = "jdbc: mysql: // localhost/digitgulf?
User = root & password = root & useUnicode = true & characterEncoding = GB2312 ";
Use the following code on the page:
Response. setContentType ("text/html; charset = gb2312 ");
Request. setCharacterEncoding ("gb2312 ");

3. garbled characters are transmitted as parameters in Chinese.
When we pass a Chinese character as a parameter to another page, garbled characters also occur. The solution is as follows:
Encode parameters when passing parameters, such
RearshRes. jsp? Keywords = "+ java.net. URLEncoder. encode (keywords)
Then, use the following statement on the receiving parameters page to receive
Keywords = new String (request. getParameter ("keywords"). getBytes ("8859_1 "));

4. garbled JSP pages
<% @ Page contentType = "text/html; charset = gb2312" language = "java" %>

**************************************** ****
JSP/JDBC MySQL garbled
Starting from:
The default request of JSP is ISO8859_1. Therefore, when processing Chinese, to display Chinese, it must be converted to GBK, as shown in the following String str = new String (request. getParameter ("name "). getBytes ("ISO8859-1"), "GBK"); out. println (str); in this way, Chinese characters can be displayed.

Chinese problems during MYSQL operations:
This depends on the default encoding of MySQL. If it is not adjusted, latin1 is actually the same as ISO8859_1, so it must be handled in the same way as it is, otherwise it will be garbled.

1. Insert Chinese characters:
String sql2 = "insert into test (name) VALUES ('" + request. getParameter ("name") + "')";
Stmt.exe cuteUpdate (sql2 );
You can insert it without coding.

2. display the inserted Chinese characters:
Because latin is saved, GBK is required for display.
String x = new String (rs. getString ("title"). getBytes ("ISO8859_1"), "GBK ");
Out. println (x );

3. Set the storage encoding:
Of course, when MySQL is latin1 encoded, it can also be stored with GBK.
Connection con = DriverManager. getConnection ("jdbc: mysql: // localhost: 3306/jsp?
UseUnicode = true & characterEncoding = GBK "," root ","");
Str1 = "Chinese ";
String sql2 = "insert into test (name) VALUES ('" + str1 + "')";
This can also be inserted successfully.

**************************************** ****
Question about Chinese character encoding in JSP/Servlet

There are many excellent articles and discussions on the issue of DBCS character encoding in JSP/Servlet on the Internet. This article will organize them and combine them with IBM WebSphere Application Server 3.5 (WAS) in the hope that it is not redundant.

1. Origin of the problem

Each country (or region) specifies the character delimiter set for computer information exchange, such as ASCII in the United States, GB2312-80 in China, JIS in Japan, etc, as the basis for information processing in the country/region, it plays an important role in unified coding. The character Collation is divided into SBCS (single-byte character set) and DBCS (dubyte Character Set) by length. Early software (especially the operating system), in order to solve the computer processing of local character information, various local versions (L10N) were introduced. to distinguish, LANG, Codepage and other concepts were introduced. However, the Code ranges of local character sets overlap, making it difficult to exchange information between them. The independent maintenance costs of each localized version of the software are high. Therefore, it is necessary to extract the commonalities in the localization work for consistent processing, so as to minimize the content of special localization processing. This is also called I18N ). The language information is further standardized as Locale information. The underlying character set to be processed becomes Unicode that contains almost all glyphs.

Currently, most of the software's core Character Processing Systems with internationalization features are Unicode-based. During software running, the corresponding local character encoding settings are determined based on the Locale/Lang/Codepage settings at that time, and handle local characters accordingly. In the process, Unicode and local character sets must be converted to each other, or two different local character sets with Unicode as the center must be converted to each other. This method is further extended in the network environment. The character information at both ends of any network needs to be converted to acceptable content according to the character set settings.

The Java language uses Unicode to represent characters and complies with Unicode V2.0. Java programs can convert character codes to read/write files in streams from/to the file system, write HTML information to URL connections, or read parameter values from URL connections. Although this method increases programming complexity and can cause confusion, it is in line with the idea of internationalization.

Theoretically, character Conversion Based on Character Set settings should not cause too many problems. The fact is that the actual running environment of applications is different. Unicode is supplemented and improved with local character sets, and the implementation of systems or applications is not standardized, the problems encountered during transcoding have always plagued programmers and users.

2. GB2312-80, GBK, GB18030-2000 Chinese Character Set

In fact, the method to solve the problem of Chinese character encoding in JAVA programs is often very simple, but to understand the reasons behind it, to locate the problem, you still need
Learn about Chinese character encoding and encoding conversion.

GB2312-80 is made in the initial stage of the development of Chinese character information technology in China, which contains most of the commonly used first-and second-level Chinese characters, and 9-area symbols. This character set is supported by almost all Chinese systems and international software. It is also the most basic Chinese character set. The encoding range is high: 0xa1-0xfe; low: 0xa1-0xfe; Chinese characters start from 0xb0a1 and end with 0xf7fe;

GBK is an extension of the GB2312-80 and is upward compatible. It contains 20902 Chinese characters and Its Encoding range is 0x8140-0xfefe, excluding the characters with a high 0x80 position. All its characters can be mapped to Unicode2.0 one-to-one. That is to say, JAVA actually supports the GBK character set. This is the default character set for Windows and some other Chinese operating systems at present, but not all international software support this character set. It seems that they do not fully understand what GBK is. It is worth noting that it is not a national standard, but a standard. With the launch of the GB18030-2000 national mark, it will fulfill its historical mission in the near future.

On the basis of GBK, GB18030-2000 (GBK2K) further expands Chinese characters and adds the fonts of Tibetan and Mongolian ethnic minorities. GBK2K fundamentally solves the problem of insufficient characters and insufficient fonts. It has several features:

● It does not determine all the glyphs, but only specifies the encoding range, which will be extended later.
● The encoding is variable, and the second part is compatible with GBK. The fourth part is the expanded font and character bit, the encoding range is the first byte 0x81-0xfe, two byte 0x30-0x39, three byte 0x81-0xfe, and four byte 0x30-0x39.
● Its promotion is in stages. The first requirement is that it can be fully mapped to all fonts of the Unicode 3.0 standard.
● It is a national standard and mandatory.
At present, no operating system or software has supported GBK2K. This is the work of current and future localization.

3. Question about JSP/Servlet Chinese character encoding and solutions in WAS

3.1 Common encoding Problems

Common JSP/Servlet encoding Problems on the Internet are generally manifested in browser or application, such:
● How can the Chinese characters on the Jsp/Servlet page displayed in the browser become '? '?
● Why are Chinese characters on the Servlet page displayed in the browser garbled?
● How do Chinese characters in the JAVA application interface become blocks?
● The Jsp/Servlet page cannot display GBK Chinese characters.
● Jsp/Servlet cannot receive Chinese characters submitted by form.
● The JSP/Servlet database cannot read or write the correct content.
What hides behind these problems is character conversion and processing of various errors (except 3rd, due to errors in Javafont settings ). To solve the problem of similar character encoding, you need to understand the running process of Jsp/Servlet and check the various points that may cause problems.

3.2 encoding in JSP/Servlet web Programming
The JSP/Servlet running on the Java application server provides HTML content for Browser,
The conversion of character encoding is as follows:

A. JSP compilation. The Java application server reads the JSP Source file Based on the JVM file. encoding value, converts it to the internal character encoding for JSP compilation, generates the JAVA source file, and writes it back to the file system based on the file. encoding value. If the current system language supports GBK, there will be no encoding Problems. For an English system, for example, if LANG is an en_US Linux, AIX, or Solaris, set the JVM file. encoding value to GBK. If the system language is GB2312, determine whether to set file. encoding as required. Setting file. encoding to GBK can solve potential GBK character garbled issues.

B. Java needs to be compiled into. class before it can be executed in JVM. This process has the same file. encoding issue as. Starting from here, servlet and jsp are similar, but Servlet compilation is not automatically performed.

C. Servlet needs to convert the HTML page content to an acceptable encoding content in browser and send it out. Depending on the implementation methods of each JAVAAppServer, some will query the accept-charset and accept-language parameters of the Browser or determine the encoding value by other guesses, and some will ignore it. Therefore, constant-encoding may be the best solution. For Chinese Web pages, you can set contentType = "text/html; charset = GB2312" in JSP or Servlet. If the page contains GBK characters, set it to contentType = "text/html; charset = GBK ", because IE and Netscape have different levels of support for GBK, You need to test this setting.

Because the 16-bit JAVAchar will be discarded when it is transmitted over the network, and to ensure that the Chinese characters on the Servlet page (including embedded and obtained during servlet running) are expected inner codes, printWriterout = res. getWriter () replaces ServletOutputStreamout = res. getOutputStream (), PrinterWriter converts data based on the charset specified in contentType (ContentType must be specified before this !); You can also use OutputStreamWriter to encapsulate the ServletOutputStream class and Use write (String) to output Chinese character strings. For JSP, the JAVA Application Server should be able to ensure that the embedded Chinese characters are correctly transmitted at this stage.

D. This is a question about the URL character encoding. If the get/post method contains Chinese characters in the value returned by the browser, the servlet cannot obtain the correct value. In SUN's J2SDK, HttpUtils. parseName does not consider the browser language settings when parsing parameters, but parses the obtained values in byte mode. This is the most talked about encoding on the Internet. Because this is a design defect, you can only re-parse the string in bin mode, or solve it in hackHttpUtils class. Refer to Articles 2 and 3, but it is best to change the Chinese encoding GB2312 and CP1381 to GBK. Otherwise, there will still be problems in the case of GBK Chinese characters.

ServletAPI2.3 provides a new function HttpServeletRequest. setCharacterEncoding is used to specify the encoding that the application wants before calling request. getParameter ("param_name"), which will help solve this problem thoroughly.

WebSphere Application Server extends the standard Servlet API 2.x to provide better multi-language support. In the above c and d cases, WAS must query the Browser language settings. By default, zh and zh-cn are all mapped to JAVA encoding CP1381 (note: CP1381 is only equivalent to a codepage of GB2312, not supported by GBK ). In this case, I think it is because I cannot confirm whether the operating system running Browser supports GB2312 or GBK. But the actual application
The system still requires GBK Chinese characters to appear on the page. The most famous one is "?" In Premier Zhu's name. (Rong2, 0xe946, \ u9555), so sometimes you still need to specify Encoding/Charset as GBK. Of course, changing the default encoding in WAS is not as troublesome as described above. For a, B, refer to Article 5), specify-Dfile in the command line parameter of Application Server. encoding = GBK. For d, specify-Ddefault in the command line parameter of ApplicationServer. client. encoding = GBK. If-Ddefault. client. encoding = GBK is specified, charset can be no longer specified in c.

3.3 encoding during database read/write

In JSP/Servlet programming, the encoding problem often occurs. Another issue is reading and writing data in the database. Popular Relational Database Systems Support database encoding. That is to say, when creating a database, you can specify its own character set settings. database data is stored in the specified encoding format. When an application accesses data, there is an encoding conversion at the entry and exit. Data integrity should be ensured for Chinese data. GB2312, GBK, UTF-8 and so on are optional database encoding; if you choose ISO8859-1 (8-bitSBCS ), before writing data, the application must split a 16-bit Chinese character or Unicode character into two 8-bit characters. After reading the data, the application must combine the two bytes, the SBCS characters are also identified. The function of database encoding is not fully utilized, but the programming complexity is increased. ISO8859-1 is not the recommended data.
Library encoding. During JSP/Servlet programming, You can first check whether the Chinese data is correct with the functions provided by the database management system.

Note that the encoding of the read data is generally Unicode in JAVA programs. The opposite is true when writing data.

3.4 frequently used troubleshooting skills

The most stupid and effective way to locate the Chinese encoding problem is to print the string's internal code after the program you think is suspected of processing. By printing the character string's internal code, you can find out when Chinese characters are converted to Unicode, when Unicode is converted back to Chinese characters, and when a Chinese character is converted to two Unicode characters, when is the Chinese string converted into a question mark? When is the high position of the Chinese string truncated ......

Selecting the appropriate sample string also helps to identify the type of the problem. For example, "aa, aa? Aa "and other Chinese and English characters, GB, and GBK character strings. In general, English characters are not distorted no matter how they are converted or processed (if you encounter it, you can try to increase the length of consecutive English letters ).

**************************************** *
Solve the jsp garbled problem.
1. The most basic garbled problem.
This garbled problem is the simplest garbled problem. Generally, it appears new. It is the garbled code caused by inconsistent page codes.
<% @ Page language = "java" pageEncoding = "UTF-8" %>
<% @ Page contentType = "text/html; charset = iso8859-1" %>

Chinese questions

I'm a good guy.

Encoding in three places.

The encoding format in the first place is the storage format of jsp files. Eclipse saves the file according to the encoding format. Compile the jsp file, including Chinese characters.

The second part is the decoding format. Because the file stored as a UTF-8 is decoded as a iso8859-1, such as a Chinese certainly garbled. That is, it must be consistent. The row in the second part does not exist. By default, it is also the encoding format that uses the iso8859-1. If this line does not exist, "I am a good person" will also be garbled. Must be consistent.

The third encoding is to control the browser's decoding method. If all the preceding decoding operations are consistent and correct, the encoding format does not matter. Some web pages are garbled because the browser cannot determine which encoding format to use. Because the page is sometimes embedded into the page, the browser obfuscated the encoding format. Garbled characters appear.

2. garbled characters received after the form is submitted in Post Mode

This is also a common problem. This garbled code is tomcat's internal encoding format iso8859-1 in disorder, that is to say, when the post is submitted, if there is no set to submit the encoding format, it will be submitted in iso8859-1 mode, the accepted jsp is accepted in UTF-8 format. Cause garbled characters. For this reason, there are several solutions and comparison below.

A performs encoding conversion when parameters are accepted.

String str = new String (request. getParameter ("something"). getBytes ("ISO-8859-1"), "UTF-8"); in this way, each parameter must be transcoded. Very troublesome. However, Chinese characters can be obtained.

B At the beginning of the request page, execute the request encoding code, request. setCharacterEncoding ("UTF-8"), set the character set of the submitted content to the UTF-8. In this way, the page that accepts this parameter does not have to be transcoded. Use Stringstr = request. getParameter ("something"); to obtain the Chinese character parameters. However, this statement must be executed on each page.
This method also has an effect on post submission. The enctype = "multipart/form-data" for get submission and file upload is invalid. Later, we will explain the garbled characters of the two.

C. To avoid writing request. setCharacterEncoding ("UTF-8") per page, we recommend that you use a filter to encode all JSPs. There are many examples on the Internet. You can check it for yourself.

3. garbled processing for form get submission.
If you use the get method to submit Chinese, the page that receives the parameter will also be garbled, this garbled cause is also caused by tomcat's internal encoding format iso8859-1. Tomcat will get the default encoding method of iso8859-1 to encode Chinese characters, encoding appended to the url, resulting in the acceptance of the page parameters are garbled /,.

Solution:
A uses the first method in the preceding example to decode the received characters and then transcode them.

B Get follows the url commit, and the iso8859-1 encoding has been performed before entering the url. To affect this encoding, you must go to the server. useBodyEncodingForURI = "true" attribute is added to the Connector node of xml to control the Chinese character encoding method of tomcat in get mode. The above attribute Controls get submission and also uses request. encode the encoding format set by setCharacterEncoding ("UTF-8. So the automatic encoding is UTF-8, and the page can be accepted normally. But I think the real encoding process is that tomcat needs

MaxThreads = "150" minSpareThreads = "25" maxSpareThreads = "75"

EnableLookups = "false" redirectPort = "8443" acceptCount = "100"

Debug = "0" connectionTimeout = "20000" useBodyEncodingForURI = "true"

DisableUploadTimeout = "true" URIEncoding = "UTF-8"/>

The URIEncoding = "UTF-8" set in it is coded again, but since it has been encoded as UTF-8, the encoding will not change. If the encoding is obtained from the url, the accept page is decoded Based on URIEncoding = "UTF-8.

4. Solve the garbled characters when uploading files

When uploading files, the form is set to enctype = "multipart/form-data ". This method submits files in streaming mode. If you use the apach Upload Component, you will find many garbled characters. This is because of a bug in the early commons-fileupload.jar of apach, Which is decoded after the Chinese characters are taken out, because this method is submitted, the encoding is automatically used by the tomcat default encoding format iso-8859-1. But the garbled problem is: periods, commas, and other special characters become garbled. If the number of Chinese characters is odd, garbled characters will occur, and even numbers will be parsed normally.

Work und: downloading the jar version of The commons-fileupload-1.1.1.jar has fixed these bugs. But the extracted characters still need to be transcoded from the iso8859-1 to UTF-8. All Chinese characters and characters can be obtained normally.

5 Java code about url requests, garbled parameters are accepted

The url encoding format depends on the URIEncoding = "UTF-8" mentioned above ". If this encoding format is set
Chinese character parameters with URLs must be encoded. Otherwise, the obtained Chinese character parameter values are garbled, for example
A link Response. sendDerect ("/a. jsp? Name = Zhang Dawei "), which is directly used in a. jsp.
String name "); garbled characters are obtained. Because UTF-8 is required, the conversion should be written as follows:
Response. sendDerect ("/a. jsp? Name = URLEncode. encode ("Zhang Dawei", "UTF-8.
What if you don't set this parameter URIEncoding = "UTF-8? If this parameter is not set, the default encoding format is used.
Iso8859-1. The problem arises again. First, if the number of parameter values is an odd number, it can be parsed normally.
Number. The final character is garbled. In addition, if the last character is in English, it can be parsed normally, but the Chinese mark
The dot symbol is still garbled. If your parameter does not contain Chinese Punctuation Marks, you can add an English character at the end of the parameter value.
To solve the garbled problem, get the parameter and then remove the final symbol. It can also be used together.

6. The script code contains garbled parameters for url requests.

The script also controls page redirection, also involves parameters, and accepts the page parsing parameter. If
The Chinese character parameters are not subject to the encoding specified by URIEncoding = "UTF-8", then the Chinese characters accepted by the page are garbled. Script
It is troublesome to process the encoding. You must have the corresponding encoding script file, and then call the method in the script to encode the Chinese characters.

7. jsp garbled characters opened in MyEclipse

For an existing project, the storage format of Jsp files may be UTF-8. If the newly installed eclipse is enabled
The encoding formats are iso8859-1. As a result, Chinese characters in jsp are garbled. This garbled code is easy to solve.
In the preference settings of eclipse3.1, find general-> edidor and set it to UTF-8. Eclipse MEETING
It is automatically re-opened in the new encoding format. The Chinese characters are displayed normally.

8. garbled characters occur when html pages are opened in eclipse.
Most pages are created by dreamweaver, and their storage formats are different from those identified by eclipse.
In general, create a new jsp in eclipse and copy the page content from dreamweaver and paste it to jsp.

**************************************** ****
Jsp Chinese garbled problem solution: personal experience in java Chinese encoding in jsp | finally, a complete solution is available.

It is common to develop java applications with garbled characters. After all, unicode is not widely used, and gb2312 (including gbk
Must be implemented correctly in the simplified, big5 traditional) System
Chinese display and database storage are the most basic requirements.

1. First, developers should clarify why they encounter garbled characters and what garbled characters they encounter (meaningless symbols are still a string of question marks or
Other things ).
When new users encounter a bunch of messy characters, they are often at a loss. The most direct reflection is to open google to search for "java Chinese" (this character
The query frequency of strings on the search engine is very high ),

Then, let's look at other people's solutions one by one. There is no error in doing so, but it is difficult to achieve the goal. The reason is described below.
In short, there are many reasons for Garbled text, and the solution is completely different. To solve the problem, you must first analyze your own "context
Environment ".

2. What information is required to determine the root cause of garbled characters in the project.
A. operating system used by developers
B. j2ee container name and version
C. Database Name, version (exact version), and jdbc driver version
D. garbled source code (such as system out or in jsp pages). If it is in jsp, the header
The situation stated by the Department is also important)

3. How to preliminarily analyze the causes of garbled characters.
With the above information, you can post for help. I believe that you will be posted on javaworld and other forums, and soon some experts will ask you
Effective solution.
Of course, you can't always rely on posting for help. You should also try to solve the problem on your own. How can this problem be solved?
A. Analyze the encoding of your garbled code. This is not difficult, for example
System. out. println (testString );
This section contains garbled characters, so you may wish to use the exhaustive method to guess its actual encoding format.
System. out. println (new String (testString. getBytes ("ISO-8859-1 bytes)," gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("UTF8 success)," gb2312 success 〃));
System. out. println (new String (testString. getBytes ("GB2312 bytes)," gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("GBK"), "gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("BIG5 rows)," gb2312 rows 〃));
The above Code reads the "garbled" testString in the specified encoding format and converts it to gb2312 (here only
Take Chinese as an example)

Then you can see which of the converted results is OK...

B. If the above steps are correct in Chinese, it means that your data is certainly there, but it is not displayed correctly on the interface.
Already. The second step is to correct your view.
Check whether the correct page encoding is selected in jsp.
I would like to declare that many people have misunderstood this point, that is, <% @ page contentType = "text/html; charset = GB2312 comment %>
Commands and Content = "text/html; charset = gb2312 character>. Many articles on the Internet usually talk about Chinese characters.
Select unicode or gb2312 storage in the database.
The code can be declared using the page command in jsp. However, I think this statement is very irresponsible, and I have spent more than N hours on it.
And don't have any garbled characters. Actually, page
The function is to provide the encoding method for java to "read" the String in the expression when jsp is compiled into html.
Similar to the role of the third statement above), and meta
It is widely known to provide the IE browser with encoding options, which is used to "display" the final data. But no reminder is displayed.
In this regard, I always use page as meta,
As a result is the iso-8859 data, the page command to read gb2312, so garbled, so added the encoding conversion function to all
String data is converted from iso8859 to gb2312 (
I didn't think so much about it at the time, because it can be displayed normally, so I changed it.
Time to troubleshoot ).

4. What encoding is better for the database.
Currently, the most popular databases are SQL server, mysql, oracle, DB2, etc. Among them, mysql is the master of free databases.
Can be recognized, installation and configuration is more convenient, the corresponding driver is also relatively complete, cost-effectiveness is absolutely OK. So take mysql as an example.
I personally recommend using the default mysql encoding for storage, that is, iso-8859-1 (in mysql options corresponding to latin-1 ). There are several reasons, one is iso-8859-1 pair
Text support is good; second, it is consistent with the default encoding in java, at least in many places without the trouble of converting the encoding; third, the default is relatively stable, and the compatibility is also better, because the multi-encoding support is provided by specific DB products, not to mention incompatibility with other databases, and compatibility issues may occur even in different versions.

For example, in products earlier than mysql 4.0, many Chinese solutions are created using the characterEncoding field in connection.
Fixed encoding, such as gb2312 or something. This is OK, because the original data is ISO8859_1 encoding, And the jdbc driver uses the character set specified in the url for encoding, resultSet. getString (*) retrieves the encoded string. In this way, the data of gb2312 is obtained directly.

However, the launch of mysql 4.1 has brought a lot of trouble to dbadmin, because mysql4.1 supports column-level characterset. Each table and column can be encoded, but not ISO8895_1, therefore, after jdbc extracts data, it will encode the data according to column's character set, instead of using a global parameter to retrieve all the data.

This also shows from another aspect that the generation of Garbled text is really complicated, for too many reasons.

I only met

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More