Website Chinese Save to database garbled

Source: Internet
Author: User
Tags recode

First check where the data starts garbled can be viewed with debug

To set the character set of the database to be the same as that of the Web page, such as Utf-8

Also choose the Post method to transfer


What is the encoding of your database, page, database connection?
As long as these three kinds are unified will not garbled!

I have organized the problem of garbled, I hope to help you:
When submitting JSP for garbled problem, first we need to figure out why garbled?
Look at the JSP header file: <%@ page contenttype= "Text/html;charset=utf-8" language= "java"%>
In this header file, there is also a property associated with the encoding: pageencoding
-----------------------------------------------------------------------------------------
First, talk about the role of several encodings in Jsp/servlet.
In the Jsp/servlet, there are several places to set the code, pageencoding= "UTF-8", contenttype= "Text/html;charset=utf-8", Request.setcharacterencoding ("UTF-8") and Response.setcharacterencoding ("UTF-8"), where the first two can be used only in JSPs, and then two for JSPs and Servlets In
-----------------------------------------------------------------------------------------
The role of request.setcharacterencoding ("UTF-8") is to set the encoding of the client request to Recode on the server side.
This method is used to specify the encoding to use when the data sent by the browser is re-encoded (or decoded).
The role of response.setcharacterencoding ("UTF-8") is to specify the encoding in which the client encodes the server response.
The encoding is used by the server to re-encode data before it is sent to the browser.
-----------------------------------------------------------------------------------------
First, the browser is how to encode the data received and sent
1. The browser accepts the data:
The role of response.setcharacterencoding ("UTF-8") is to specify the encoding to recode the server response. At the same time, the browser is also based on this parameter to re-encode the data it receives (or is called decoding). So whether you set response.setcharacterencoding ("UTF-8") or response.setcharacterencoding ("GBK") in your JSP, the browser will display Chinese correctly.
Readers can do an experiment, in the JSP set Response.setcharacterencoding ("UTF-8"), in the IE display of the page, in the IE menu, select "View (V)" à "code (D)" can be viewed in the " Unicode (UTF-8) ", and in the JSP set Response.setcharacterencoding (" GBK "), in IE, when the page is displayed, in the IE menu, select" View (V) "à" encoding (D) "can be viewed in the" Simplified Chinese (GB2312) ".
2. The browser sends the data:
When the browser sends the data, the URL and the parameters are URL-encoded, the Chinese in the parameter, the browser also makes the response.setcharacterencoding parameter URL encoding. Take Baidu and Google as an example, if you search for "Chinese characters" in Baidu, Baidu will encode it as "%ba%ba%d7%d6". Google's search for "Chinese characters", Google will encode it as "%e6%b1%89%e5%ad%97", this is because Baidu's response.setcharacterencoding parameter is GBK, and Google's response.setcharacterencoding parameter is UTF-8.
--------------------------------------------------------
Second, the server is to receive and send data, how to encode the data
1. The server sends the data
For sending data, the server encodes the data to be sent in the order of precedence of the response.setcharacterencoding-contenttype-pageencoding.
2. The server receives data
There are three scenarios for receiving data. One is the data that the browser submits directly with the URL, and the other two are data submitted using the form's get and post methods.
Because the various web servers handle these three different ways, we take Tomcat5.0 as an example.
① data submitted for post in a form
As long as the response.setcharacterencoding or contenttype or pageencoding are set to "Utf-8" on the JSP page, there will be no problem with the Chinese garbled in the jsp/servlet of the accepted data.
② data submitted for URL submission and get method submissions in forms
It is not possible to just set the request.setcharacterencoding parameter in the Jsp/servlet receiving the data.
Because in Tomcat5.0, the problem is resolved by using iso-8859-1 to re-encode (decode) the data submitted by the URL and the Get method submitted in the form by default:
The Usebodyencodingforuri or uriencoding attribute should be set in the Connector tab of the Tomcat configuration file server.xml. Where the uriencoding parameter specifies a uniform recoding (decoding) encoding of all get method requests, including data submitted by the URL and the data submitted in the form for the Get method.
Where the Usebodyencodingforuri parameter indicates whether the data submitted by the URL and the data submitted in the form are re-encoded with the request.setcharacterencoding parameter, by default, the parameter is false;
The difference between uriencoding and Usebodyencodingforuri is that uriencoding is a uniform recoding (decoding) of all the data requested by the Get method, Usebodyencodingforuri is the re-encoding (decoding) of the data according to the request.setcharacterencoding parameter of the page that should be requested, and the different pages can have different encodings (decoding). So for the data submitted by the URL and the data that is submitted in the form, you can modify the uriencoding parameter to encode the browser or modify Usebodyencodingforuri to true, and in the JSP page that gets the data The request.setcharacterencoding parameter is set to the browser encoding.
----------------------------------------------------------
The following summarizes how to prevent Chinese garbled when Tomcat5.0 is a Web server.
1, for the same application, the best unified code, recommended for UTF-8, of course, GBK can also.
2, the correct set of JSP pageencoding= "UTF-8"
3. Set contenttype= "Text/html;charset=utf-8" or response.setcharacterencoding ("UTF-8") in all Jsp/servlet, Thus, the setting of the browser encoding is indirectly implemented.
4. For a Get or URL request that is not a form submission, you can modify the default configuration of Tomcat, we recommend setting the Usebodyencodingforuri parameter to True, or you can set the uriencoding parameter to UTF-8 (which may affect other applications, So not recommended). Or, use the following method to process the data when it is received:
Request.getparameter ("userid"), get the value of UserID
Request.getparameter ("UserID"). Trim () Remove this value from both sides of the space
Request.getparameter ("UserID"). Trim (). GetBytes ("iso-8859-1") encodes this string into a byte-count ancestor using Iso-8859-1
New String (Request.getparameter ("UserID"). Trim (). GetBytes ("Iso-8859-1"), "Utf-8") the number of bytes that have just been inherited into String's constructor follows "Utf-8" The encoding creates a string object.
5. How to use Urlencoder
Before you pass the parameter:
Converts a string to the application/x-www-form-urlencoded format using the specified encoding mechanism
String Username_encoder = Urlencoder.encode (username, "UTF-8");
After the parameters are displayed with:
Decodes a application/x-www-form-urlencoded string using the specified encoding mechanism
String Username_decoder = Urldecoder.decode (Request.getparameter ("username"), "UTF-8");
-----------------------------------------------------------------------------------------
What is "GBK"? What is "Utf-8"?
One, the distinction between characters
GBK contains all Chinese characters;
UTF-8 contains the characters that are needed for all countries in the world.
Second, the coding distinction
GBK is in the national standard GB2312 based on the expansion of compatible GB2312 standards (as if not national standards)
UTF-8 encoded text can be displayed on a variety of browsers that support UTF8 character sets in various countries.
For example, in the case of UTF8 encoding, Chinese can also be displayed on foreigners ' English ie without the need for them to download the Chinese language support package for IE. Therefore, for the more English forum, use GBK each character occupies 2 bytes, and the use of UTF-8 English is only a single byte.
Third, the use of the distinction
GBK is China's national code, the commonality is worse than the UTF8, but the UTF8 occupies a larger database than GBK, and is generally do the forum dz These procedures, corresponding components and plug-in support GBK corresponding development of the relatively comprehensive point, and then DIY more convenient.
UTF8 is an international code, its versatility is better, foreigners can also browse the forum, and Chinese can be directly recognized, if your forum to do more internationalization that must be used UTF8.

Add: UTF8 more traditional support than the GBK have the advantage drops.
For the DZ Forum, a lot of plug-ins are only supported GBK, if need to install more plug-in forum or with GBK better, and to install less plug-in and a special user group forum with UTF8 better.
So, generally if you do the forum is only in the domestic specific circles in the GBK simple point, the basic plug-ins can be installed, but if your station has foreign market needs on the proposal UTF8,

In the development of Web Chinese website, GBK and UTF-8 are two kinds of character sets which are used more, but they are different. Summarized below.
1. The text encoding of the GBK is double-byte, that is, both Chinese and English characters are represented by double-byte, except that the highest bit is determined to be 1.
UTF-8 encoding is a multi-byte encoding used to solve international characters, which uses 8 bits (or one byte) in English and is encoded in Chinese using 24 bits (three bytes). For forums with more English characters, you can save space with UTF-8.
2. GBK contains all Chinese characters, including simplified and traditional
UTF-8 contains the characters that are needed for all countries in the world.
3. GBK is a standard that is compatible with GB2312 on the basis of the national standard GB2312 (as if it is not the national standard)
UTF-8 encoded text can be displayed on a variety of browsers that support UTF8 character sets in various countries.
For example, if it is a UTF8 code, it can display Chinese in the foreigner's English ie, without requiring them to download IE's Chinese language support package. Therefore, for the more English forum, use GBK each character occupies 2 bytes, and the use of UTF-8 English is only a single byte.

Please note:
Although the UTF-8 version has good international compatibility, Chinese needs to occupy 50% more database storage space than the GBK/BIG5 version, so it is not recommended for use by users with special requirements for international compatibility.
To put it simply:
For more Chinese forums, it is appropriate to use GBK encoding to save database space.
For the English more forum, it is suitable to use UTF-8 to save database space.

MySQL Settings

A while ago, has been plagued by the MySQL character set, today to summarize this knowledge.
MySQL Character set support (Character set supports) has two aspects:
Character Set (Characterset) and sort mode (Collation).
Support for character sets is refined to four levels:
Servers (server), database, data tables (table), and connections (connection).


1.MySQL Default Character Set

MySQL Specifies the character set that can be refined to a database, a table, a column, and what character sets should be used.

However, traditional programs do not use a complex configuration when creating databases and datasheets, and they use the default configuration, so where does the default configuration come from?

(1) When compiling MySQL, a default character set is specified, and this character set is latin1;
(2) When installing MySQL, you can specify a default character set in the configuration file (My.ini), and if not specified, this value is inherited from the compile-time specified;
(3) When starting mysqld, you can specify a default character set in the command line arguments, if not specified, this value inherits from the configuration in the configuration file, at this time character_set_server is set to this default character set;
(4) When a new database is created, the character set of the database is set to character_set_serverby default unless explicitly specified;
(5) When a database is selected,character_set_database is set to the default character set of this database;
(6) When creating a table in this database, the default character set of the table is set to character_set_database, which is the default character set of the database;
(7) When a column is set in the table, the default character set of the column is the set of defaults for the table unless explicitly specified;

Simply summarize, if there is no change anywhere, then all the tables of all the columns of the database are stored with latin1, but if we install MySQL, we will generally choose multi-language support, that is, the installation program will automatically in the configuration file Default_ Character_set is set to UTF-8, which guarantees that all the columns of all the tables in the database are stored UTF-8 by default.


2. View the default character set (by default, MySQL's character set is Latin1 (iso_8859_1)
In general, the settings for viewing the character set and ordering of the system can be set through the following two commands:
mysql> SHOW VARIABLES like ' character% ';
+--------------------------+---------------------------------+
| variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | Latin1 |
| character_set_connection | Latin1 |
| Character_set_database | Latin1 |
| Character_set_filesystem | binary |
| Character_set_results | Latin1 |
| Character_set_server | Latin1 |
| Character_set_system | UTF8 |
| Character_sets_dir | D: "mysql-5.0.37" Share "charsets" |
+--------------------------+---------------------------------+

mysql> SHOW VARIABLES like ' collation_% ';
+----------------------+-----------------+
| variable_name | Value |
+----------------------+-----------------+
| collation_connection | Utf8_general_ci |
| Collation_database | Utf8_general_ci |
| Collation_server | Utf8_general_ci |
+----------------------+-----------------+

3. Modifying the default character set
(1) The simplest method of modification is to modify the character set key values in the MySQL My.ini file,
such as Default-character-set = UTF8
Character_set_server = UTF8
After the modification, restart the MySQL services, service MySQL restart
Use mysql> show VARIABLES like ' character% '; view, discover that database encoding has been changed to UTF8
+--------------------------+---------------------------------+
| variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | UTF8 |
| character_set_connection | UTF8 |
| Character_set_database | UTF8 |
| Character_set_filesystem | binary |
| Character_set_results | UTF8 |
| Character_set_server | UTF8 |
| Character_set_system | UTF8 |
| Character_sets_dir | D: "mysql-5.0.37" Share "charsets" |
+--------------------------+---------------------------------+

(2) There is also a way to modify the character set, that is, the command to use MySQL
mysql> SET character_set_client = UTF8;
mysql> SET character_set_connection = UTF8;
mysql> SET character_set_database = UTF8;
mysql> SET character_set_results = UTF8;
mysql> SET character_set_server = UTF8;

mysql> SET collation_connection = UTF8;
mysql> SET collation_database = UTF8;
mysql> SET collation_server = UTF8;


In general, even if the default character set for the table is UTF8 and the query is sent through UTF-8 encoding, you will find that the database is still garbled. The problem is on the connection connection layer. The workaround is to execute the following sentence before sending the query:

SET NAMES ' UTF8 ';

It is equivalent to the following three-sentence instruction:
SET character_set_client = UTF8;
SET character_set_results = UTF8;
SET character_set_connection = UTF8;

Summarize:
Therefore, the use of what database version, whether it is 3.x, or 4.0.x or 4.1.x, in fact, it is not important for us, it is important to have two:
1) correctly set the database code. MySQL4.0 the following version of the character set is always the default iso8859-1,mysql4.1 when installed will let you choose. If you are ready to use UTF-8, you will need to specify a good UTF-8 when you create the database (you can change it after you create it, and you can specify the table's character set for more than 4.1 versions)
2) correctly set the database connection encoding. After you have set up the encoding of the database, you should specify the encoding of the connection when you connect to the database, such as when using a JDBC connection, specify the connection as the UTF8 mode.

Website Chinese Save to database garbled

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.