Java Chinese garbled characters

Last Update:2018-12-05 Source: Internet

Author: User

Tags driver manager mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

<% @ Page language = "java" pageEncoding = "UTF-8" %>

<% @ Page contentType = "text/html; charset = iso8859-1" %>

<Html>

<Head>

<Title> Chinese </title>

</Head>

<Body>

I'm a good guy.

</Body>

</Html>

Encoding in three places.

The encoding format in the first place is the storage format of jsp files. Eclipse saves the file according to the encoding format. Compile the jsp file

Including Chinese characters.

The second part is the decoding format. Because the file stored as a UTF-8 is decoded as a iso8859-1, such as a Chinese certainly garbled. That's it.

Yes. The row in the second part does not exist. By default, it is also the encoding format that uses the iso8859-1. So if no

In this line, "I am a good guy" will also be garbled. Must be consistent.

The third encoding is to control the browser's decoding method. If all the preceding decoding operations are consistent and correct, the encoding format does not matter.

. Some web pages are garbled because the browser cannot determine which encoding format to use. Because the page is sometimes embedded into the page

The browser obfuscated the encoding format. Garbled characters appear.

2. garbled characters received after the form is submitted in Post Mode

This is also a common problem. This garbled code is tomcat's internal encoding format iso8859-1 in disorder, that is to say post

At the time of submission, if no encoding format is set for submission, it is submitted in iso8859-1 mode, and the accepted jsp is in UTF-8 Mode

Accept. Cause garbled characters. For this reason, there are several solutions and comparison below.

A performs encoding conversion when parameters are accepted.

String str = new String (request. getParameter ("something"). getBytes ("ISO-8859-1"), "UTF-8 ")

In this way, each parameter must be transcoded in this way. Very troublesome. However, Chinese characters can be obtained.

B At the beginning of the request page, execute the request encoding code, request. setCharacterEncoding ("UTF-8"), put

The character set of the capacity is set to UTF-8. In this way, the page that accepts this parameter does not have to be transcoded. Direct use

String str = request. getParameter ("something. However, this statement must be executed on each page.

This method also has an effect on post submission. For the enctype = "multipart/form-data" when the get is submitted and the file is uploaded

Valid. Later, we will explain the garbled characters of the two.

C To avoid writing request. setCharacterEncoding ("UTF-8") per page, we recommend that you use a filter for all jsp

Encoding. There are many examples on the Internet. You can check it for yourself.

3. garbled processing for form get submission.

If you use the get method to submit Chinese characters, garbled characters will also appear on the page that receives parameters. The reason for this garbled code is the internal encoding format of tomcat.

Type iso8859-1. Tomcat will get the default encoding method of iso8859-1 to encode Chinese characters, encoding appended to the url, resulting in

Parameters obtained from the acceptance page are garbled/and /,.

Solution:

A uses the first method in the preceding example to decode the received characters and then transcode them.

B Get follows the url commit, and the iso8859-1 encoding has been performed before entering the url. To affect this encoding, you must

UseBodyEncodingForURI = "true" is added to the Connector node of server. xml"

Attribute configuration to control the Chinese character encoding method of get in tomcat. The preceding attribute control is also used for get submission.

The encoding format set by request. setCharacterEncoding ("UTF-8") is encoded. So the automatic encoding is UTF-8, And the accept page

It can be accepted normally. But I think the real encoding process is that tomcat needs

<Connection port = "8080"

MaxThreads = "150" minSpareThreads = "25" maxSpareThreads = "75"

EnableLookups = "false" redirectPort = "8443" acceptCount = "100"

Debug = "0" connectionTimeout = "20000" useBodyEncodingForURI = "true"

DisableUploadTimeout = "true" URIEncoding = "UTF-8"/>

Inside the set URIEncoding = "UTF-8" Again encoding, but because the encoding is UTF-8, encoding will not change

. If the encoding is obtained from the url, the accept page is decoded Based on URIEncoding = "UTF-8.

4. Solve the garbled characters when uploading files

When uploading files, the form is set to enctype = "multipart/form-data ". This method submits files in streaming mode.

If you use the apach Upload Component, you will find many garbled characters. This is because the early commons-fileupload.jar of apach has

Bug, extract Chinese characters After decoding, because this method of submission, encoding and automatic use of tomcat default encoding format iso-8859-1

. But the garbled problem is: periods, commas, and other special characters become garbled. If the number of Chinese characters is odd, garbled characters will appear.

, The even number is parsed normally.

Work und: downloading the jar version of The commons-fileupload-1.1.1.jar has fixed these bugs.

But the extracted characters still need to be transcoded from the iso8859-1 to UTF-8. All Chinese characters and characters can be obtained normally.

5 Java code about url requests, garbled parameters are accepted

The url encoding format depends on the URIEncoding = "UTF-8" mentioned above ". If this encoding format is set

Chinese character parameters with URLs must be encoded. Otherwise, the obtained Chinese character parameter values are garbled, for example

A link Response. sendDerect ("/a. jsp? Name = Zhang Dawei "), which is directly used in a. jsp.

String name "); garbled characters are obtained. Because UTF-8 is required, the conversion should be written as follows:

Response. sendDerect ("/a. jsp? Name = URLEncode. encode ("Zhang Dawei", "UTF-8.

What if you don't set this parameter URIEncoding = "UTF-8? If this parameter is not set, the default encoding format is used.

Iso8859-1. The problem arises again. First, if the number of parameter values is an odd number, it can be parsed normally.

Number. The final character is garbled. In addition, if the last character is in English, it can be parsed normally, but the Chinese mark

The dot symbol is still garbled. If your parameter does not contain Chinese Punctuation Marks, you can add an English character at the end of the parameter value.

To solve the garbled problem, get the parameter and then remove the final symbol. It can also be used together.

6. The script code contains garbled parameters for url requests.

The script also controls page redirection, also involves parameters, and accepts the page parsing parameter. If

The Chinese character parameters are not subject to the encoding specified by URIEncoding = "UTF-8", then the Chinese characters accepted by the page are garbled. Script

It is troublesome to process the encoding. You must have the corresponding encoding script file, and then call the method in the script to encode the Chinese characters.

7. jsp garbled characters opened in MyEclipse

For an existing project, the storage format of Jsp files may be UTF-8. If the newly installed eclipse is enabled

The encoding formats are iso8859-1. As a result, Chinese characters in jsp are garbled. This garbled code is easy to solve.

In the preference settings of eclipse3.1, find general-> edidor and set it to UTF-8. Eclipse MEETING

It is automatically re-opened in the new encoding format. The Chinese characters are displayed normally.

8. garbled characters occur when html pages are opened in eclipse.

Most pages are created by dreamweaver, and their storage formats are different from those identified by eclipse.

In general, create a new jsp in eclipse and copy the page content from dreamweaver and paste it to jsp.
//////////////////////////////////////// //////////////////////////////////////// //////////
Jsp Chinese garbled problem solution: personal experience in java Chinese encoding in jsp | finally, a complete solution is available.
April 5 th, 2006
================================= Http://www.glgg.net/blog===================

It is common to develop java applications with garbled characters. After all, unicode is not widely used, and gb2312 (including gbk

Must be implemented correctly in the simplified, big5 traditional) System

Chinese display and database storage are the most basic requirements.

======================================= Http://www.glgg.net/blog===============
1. First, developers should clarify why they encounter garbled characters and what garbled characters they encounter (meaningless symbols are still a string of question marks or

Other things ).
When new users encounter a bunch of messy characters, they are often at a loss. The most direct reflection is to open google to search for "java Chinese" (this character

The query frequency of strings on the search engine is very high ),

Then, let's look at other people's solutions one by one. There is no error in doing so, but it is difficult to achieve the goal. The reason is described below.
In short, there are many reasons for Garbled text, and the solution is completely different. To solve the problem, you must first analyze your own "context

Environment ".

======================================= Http://www.glgg.net/blog===============
2. What information is required to determine the root cause of garbled characters in the project.
A. operating system used by developers
B. j2ee container name and version
C. Database Name, version (exact version), and jdbc driver version
D. garbled source code (such as system out or in jsp pages). If it is in jsp, the header

The situation stated by the Department is also important)

======================================= Http://www.glgg.net/blog==============
3. How to preliminarily analyze the causes of garbled characters.
With the above information, you can post for help. I believe that you will be posted on javaworld and other forums, and soon some experts will ask you

Effective solution.
Of course, you can't always rely on posting for help. You should also try to solve the problem on your own. How can this problem be solved?
A. Analyze the encoding of your garbled code. This is not difficult, for example
System. out. println (testString );
This section contains garbled characters, so you may wish to use the exhaustive method to guess its actual encoding format.
System. out. println (new String (testString. getBytes ("ISO-8859-1 bytes)," gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("UTF8 success)," gb2312 success 〃));
System. out. println (new String (testString. getBytes ("GB2312 bytes)," gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("GBK"), "gb2312 bytes 〃));
System. out. println (new String (testString. getBytes ("BIG5 rows)," gb2312 rows 〃));
The above Code reads the "garbled" testString in the specified encoding format and converts it to gb2312 (here only

Take Chinese as an example)
Then you can see which of the converted results is OK...

B. If the above steps are correct in Chinese, it means that your data is certainly there, but it is not displayed correctly on the interface.

Already. The second step is to correct your view.

Check whether the correct page encoding is selected in jsp.

I would like to declare that many people have misunderstood this point, that is, <% @ page contentType = "text/html; charset = GB2312 comment %>

Command and <META http-equiv = Content-Type

Content = "text/html; charset = gb2312 character>. Many articles on the Internet usually talk about Chinese characters.

Select unicode or gb2312 storage in the database.

The code can be declared using the page command in jsp. However, I think this statement is very irresponsible, and I have spent more than N hours on it.

And don't have any garbled characters. Actually, page

The function is to provide the encoding method for java to "read" the String in the expression when jsp is compiled into html.

Similar to the role of the third statement above), and meta

It is widely known to provide the IE browser with encoding options, which is used to "display" the final data. But no reminder is displayed.

In this regard, I always use page as meta,

As a result is the iso-8859 data, the page command to read gb2312, so garbled, so added the encoding conversion function to all

String data is converted from iso8859 to gb2312 (

I didn't think so much about it at the time, because it can be displayed normally, so I changed it.

Time to troubleshoot ).

====================================== Http://www.glgg.net/blog==============

4. What encoding is better for the database.
Currently, the most popular databases are SQL server, mysql, oracle, DB2, etc. Among them, mysql is the master of free databases.

It can be recognized and easy to install and configure.

The corresponding driver is also relatively complete, and the price/performance ratio is absolutely OK. So take mysql as an example.

I personally recommend using the default mysql encoding for storage, that is, iso-8859-1 (in mysql options corresponding to latin-1 ). Management

By the main there are so few, one is the iso-8859-1 pair

Text support is good; second, it is consistent with the default encoding in java, at least in many places without the trouble of converting the encoding; third, the default ratio

It is more stable and has better compatibility, because the multi-Encoding

The support is provided by specific DB products, not to mention being incompatible with other databases, and compatibility may occur even in different versions of the database.

Question.

For example, in products earlier than mysql 4.0, many Chinese solutions are created using the characterEncoding field in connection.

Encoding, such as gb2312 or something. This is OK.

Because the original data is encoded in ISO8859_1, The jdbc driver uses the character set specified in the url for encoding,

ResultSet. getString (*) extracts the encoded

String. In this way, the data of gb2312 is obtained directly.

However, the launch of mysql 4.1 has brought a lot of trouble to dbadmin, because mysql4.1 supports column-level character.

Set. Each table and column can be encoded.

The parameter is not specified as ISO8895_1. Therefore, after jdbc extracts data, it will encode the data according to the column character set, instead of using

A global parameter is used to retrieve all data.

This also shows from another aspect that the generation of Garbled text is really complicated, for too many reasons. I only met myself

//////////////////////////////////////// //////////////////////////////////////// ////////
Solution to jsp Chinese problems [reprinted]

Like Java, JSP is currently a hot topic. It is a Web design language compiled and executed on the server, because the script

The language uses Java, so JSP inherits all the advantages of Java. However, Chinese garbled characters are often encountered during the use of JSP programs.

A lot of people have a headache for this. When I was a beginner, I suffered a lot from it. Besides, different platforms are used, and there are no solutions to Chinese Garbled text.

Similarly, it increases the difficulty of learning JSP. In fact, after thoroughly understanding the relevant causes, the problem is still relatively easy to solve ., Below

It is the solution I have summarized and I believe it will be of some reference significance to readers. (Because I use the Tomcat environment most

Take Tomcat as an example. I will only mention it in other environments, but the solution is similar!
Each country (or region) specifies the character delimiter set for computer information exchange, such as the extended ASCII code of the United States and the GB2312 character set of china.

-80. JIS of Japan plays an important role in unified coding as the basis for information processing in the country (region. Because each Local word

Code ranges of the collection overlap, making it difficult to exchange information between each other, and the independent maintenance cost of software localized versions is high. Therefore, it is necessary

This is called internationalization.

(I18N ). The language information is normalized as local information, while the underlying character set uses Unicode that contains all characters.

I believe that the readers of JSP code to understand the ISO8859-1 must be no stranger, ISO8859-1 is a CodePage we usually use more,

It belongs to the Western European language. GB2312-80 is developed in the initial stage of the development of Chinese character information technology in China.

Common first and second-level Chinese characters and 9-area symbols. This character set is a Chinese character supported by almost all Chinese systems and international software.

Set, which is also the most basic Chinese character set.

GBK is an extension of the GB2312-80 and is upward compatible. It contains 20902 Chinese characters and Its Encoding range is 0x8140 ~

0 xFEFE removes the 0x80 characters at the top. All the characters can be mapped to Unicode 2.0 one-to-one, that is, Java

In fact, it provides support for GBK character sets.

> GB18030-2000 (GBK2K) on the basis of GBK to further expand the Chinese characters, added the Tibetan, Mongolian and other ethnic minorities.

GBK2K fundamentally solves the problem of insufficient characters and insufficient fonts.
1. Tomcat 4 Development Platform
This version should be a frequently used version, so it will be discussed in detail.
Chinese characters may occur in Tomcat 4 and later versions in Windows 98/2000 (not in Linux and Tomcat 3. x ).

Problem), the main manifestation is the page display garbled.
To solve this problem, the simplest method is to add <% @ page language = "Java" at the beginning of each JSP page"

ContentType = "text/html; charset = gb2312" %>. However, this is not enough. Although Chinese characters are displayed

Fields read from the Database become garbled. After analysis, it is found that the Chinese characters stored in the database are normal, and the database uses

ISO8859-1 character set to access data, while Java uses a unified ISO8859-1 character set by default when processing characters (this also reflects

Java internationalization idea), so when the data is added Java and the database are processed in the ISO8859-1 approach, this will not error. However

Is reading data when there is a problem, because the data reading also uses the ISO8859-1 character set, and JSP file header has a statement

<% @ Page language = "Java" contentType = "text/html; charset = gb2312" %>.

The GB2312 character set is displayed, which is different from the read data. The page displays garbled characters read from the database.

The method is to transcode these characters, from the ISO8859-1 to GB2312, you can be normal display. This solution is applicable to many platforms

It is universal and can be used flexibly by readers. The specific methods are described in detail below. In addition, for different databases such as SQL

Server, Oracle, Mysql, Sybase, etc. Character Set selection is very important. If you consider a multi-language version, the character set of the database is

The ISO8859-1 should be adopted in a unified manner, and the conversion between different character sets should be done when the output is needed.
The following is a summary of different platforms:
(1) JSWDK is only suitable for common development, and stability and other problems may be inferior to commercial software. JDK 1.3 performs better

JDK 1.2.2 is widely used and supports Chinese characters. Now jdk 1.4 is available, so

If you want to upgrade to the latest version, the Chinese version will be better and more support will be available.
(2) Tomcat is only an implementation of JSP 1.1 and Servlet 2.2 Standards. We should not require this free software in details.

And performance are all-inclusive. It mainly considers English users. This is why Chinese characters are transmitted using the URL method without special conversion.

The cause of the problem. Most ie browsers always send via UTF-8 by default, which seems to be a disadvantage of Tomcat, and Tomcat doesn't matter

It seems that the current operating system uses ISO8859 to compile JSP.
2. Chinese processing of JSP code
(1) for data-independent operations, you can add
(2) Transfer the value in Form to the database and then retrieve it, and then change it to "?". Form uses POST to submit data.

Sentence:
String st = new (request. getParameter ("name"). getBytes ("ISO8859_1"), and

Charset = gb2312.

To process the Chinese parameters passed in Form, add the following code to JSP and define

GetStr class, and then convert the received parameters:
String keyword1 = request. getParameter ("keyword1 ");
Keyword1 = getStr (keyword1 );
The Code is as follows:
<% @ Page contentType = "text/html; charset = gb2312" %>
<%!
Public String getStr (String str ){
Try {String temp_p = str;
Byte [] temp_t = temp_p.getBytes ("ISO8859-1 ");
String temp = new String (temp_t );
Return temp;
}
Catch (Exception e ){}
Return "NULL ";
}
%>
<% -- Test http://www.cndes.com -- %>
<% String keyword = "welcome to the chuanglian Network Technology Center ";
String keyword1 = request. getParameter ("keyword1 ");
Keyword1 = getStr (keyword1 );
Out. print (keyword );
Out. print (keyword1 );
%>
In addition, popular Relational Database Systems Support database Encoding, that is, when creating a database, you can specify its own characters.

Set, database data is stored in the specified encoding format. When the application accesses data, there will be

Encoding conversion. For Chinese data, the character encoding settings of the database should ensure data integrity. GB2312, GBK,

UTF-8 and so on are optional database Encoding, you can also choose ISO8859-1 (8-bit), but will increase the programming complexity

Complexity, ISO8859-1 is not the recommended database Encoding. For JSP/Servlet programming, you can use the database management system to provide

To check whether the Chinese data is correct.

(3) JDBC Driver character conversion
Currently, most JDBC drivers use local encoding to transmit Chinese characters. For example, the Chinese character "0x00005" is converted to "0x41 ".

And 0x75. Therefore, the characters returned by the JDBC Driver and the characters to be sent to the JDBC Driver must be converted.

. When you use the JDBC Driver to insert data into the database, you must convert Unicode to Native code first. When the JDBC Driver

When querying data from a database, you need to convert the Native code to Unicode. The implementation of these two conversions is given below:
String native2Unicode (String s ){
If (s = null | s. length () = 0 ){
Return null;
}
Byte [] buffer = new byte [s. length ()];
For (int I = 0; I s. length (); I ++) {if (s. charAt (I)> = 0x100 ){
C = s. charAt (I );
Byte [] buf = ("" + c). getBytes ();
Buffer [j ++] = (char) buf [0];
Buffer [j ++] = (char) buf [1];
}
Else {buffer [j ++] = s. charAt (I );}
}
Return new String (buffer, 0, j );
}
Note that if some JDBC drivers use JDBC Driver Manager to set correct character set attributes

No. For more information, see JDBC.
Actually, I understand that Chinese Garbled text is just like this! After repeated use, you will find a certain portal! I think the above three methods, as long as you

I can really understand it. When you encounter Chinese problems, try more in these three methods. I promise you will not bother with this Chinese problem!
The above are just some of my experiences. If there is anything wrong, I hope I can propose and learn together!

//////////////////////////////////////// //////////////////////////////////////// //////////

My path to Garbled text-the Chinese Garbled text solution for JSP and MySQL interaction and its summary
First, a StringConvert bean (GBtoISO () and ISOtoGB () methods are implemented.

Some Chinese garbled characters during interaction: Read MySQL Chinese content in JSP programs. These two methods can be used to solve the garbled problem.

However, the Chinese content written from JSP to MySQL is garbled and displayed as "?" again. Here it should be

The character information is lost during encoding and conversion. Sadly, after logging on to MySQL in the command line window, execute

When a statement such as "Insert INTO mermervalues ('character ',...)" is written to the data table, the Chinese content in the data table is displayed as positive

Regular !!! The character set used by the database is utf8.

After hitting the wall multiple times, I finally found a solution to the problem: when I checked the MySQL manual, I saw a statement like this:

Toallow multiple character sets to be sent from the client, the "UTF-8" encoding shocould be

Used, either by grouping "utf8" as the defaultserver character set, or by grouping

The JDBC driver to use "UTF-8" through the characterEncoding property.

In addition, you can use this syntax to convert a string to

Given Character Set: _ charset str.

Charset must be a character set supported by the server. In this example, the default character set used by shopdb is utf8.

And then start the test:

Enter Insert INTO publish Values ('8', _ gb2312 'Higher Education Press') to write the data INTO "?

Try inserting INTO publish Values ('8', _ gbk 'Higher Education Press ').

Insert INTO publish Values ('8', _ utf8' Higher Education Press '). This is even more simple and there is nothing !!

Crazy !! No way. Use the show character set; command to view the character sets supported by MySQL.

One is successful. Browse a bit and find that there are not a few familiar character sets, there is only one latin1 (ISO-8859-1) more often

See it, It's not it. It's a try.

Insert INTO publish Values ('8', _ latin1 'higher education publishers.

Now I finally found the method and changed the url of the database connection pool configured in Tomcat

"... CharacterEncoding = UTF-8", and then write the Chinese content written to the database

String s2 = new String (s1.getBytes ("gb2312"), "ISO-8859-1") for transcoding, where s1 is a Chinese character

String, and then write it into the database. Everything is normal.

In order to solve this problem, I have checked n multiple materials. Now I will make a summary: due to the differences in character sets and character encoding methods

When data is transmitted between programs (especially data in multiple character sets), garbled characters and character information are lost.

The key to solving this problem is to understand the character set and character encoding method used by the data output end and the receiver.

Different transcoding methods are required at the data exit or entrance. Generally, it is written, compiled, and run

Therefore, you must be careful when transferring data.

When writing code, you may use a development tool, such as the Eclipse that I am using.

Everything works, but once the file is saved and opened again, all Chinese characters are garbled. This is because

Some character data is in a stream in the memory, OK, this is OK, but the data in this stream will be written

Hard Disk, use is your development tool default encoding method, if unfortunately your development tool default encoding method is ISO-8859-1

The Chinese character information cannot be properly stored. In Eclipse, you can view and modify the default character encoding method: Project-

> Properties-> info, which contains "default encoding for text file ". If it is set to GBK, write the code

And save the settings.

For JSP programs, after the code is compiled, it is handed over to the Container. First, they are converted into. java files and then compiled

. Class can be submitted to the server for execution. This process also has character encoding problems. java compiler (javac) uses the Operating System Language

The Environment is the default character encoding method, as is JRE (Java Runtime Environment. Only when compiling and Running Environments

The Chinese characters can be correctly displayed only when the character encoding mode of is the same as that of the source file. Otherwise, you need to execute

Transcode to make them use compatible encoding. The settings here can be divided into several layers: The Language supported by the operating system layer, which is the most important

Because it affects JVM's default character encoding mode and directly affects character display, such as font. J2EE server layer

Most servers can customize character encoding. For example, Tomcat can be set in web. xml.

The javaEncoding parameter sets the character encoding, which is a UTF-8 by default.

IE can also be set to always use UTF-8 encoding to send requests. application layer, every program configured under the server can

Set your own encoding method. I haven't used this yet. I will study it later.

During transcoding and running, applications may need to interact with external systems, such as reading and writing databases.

To read and write external files. In these cases, the application is inevitably required to exchange data with the external system. Then for the Chinese text

It is particularly important to encode the data entry and exit. Generally, external systems have their own character encoding methods. In my example

The configured MySQL is the UTF-8 encoding used. On the JSP page, set "charset = gb2312 ",

When using gb2312 encoding, explicit transcoding is required when it interacts with the database to correctly process Chinese characters.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More