Java coding UTF-8, ISO-8859-1, GBK

Source: Internet
Author: User

Java coding UTF-8, ISO-8859-1, GBK

Java support UTF-8, ISO-8859-1, GBK and other font encoding, the author found that the problem of font encoding in Java is still difficult to fall a lot of programmers, although there are many articles on how to correctly display Chinese Characters in Java on the Internet, they are not comprehensive enough. I hereby summarize them as follows.

There are several factors that affect the correct display of font encoding in Java: 1) database connection mode; 2) font encoding used in webpages; 3) font encoding of data stored in databases; 4) java default font encoding. If the Chinese character cannot be correctly displayed during programming, You need to first find out the font encoding used by the preceding items, and then analyze the cause to solve the problem.

As we all know, JSP is a type of Java, and it is related to web pages, and web pages also have their own Chinese encoding systems. Therefore, JSP processing of Chinese is more troublesome than pure Java class files. The test database in this article is mysql3.2, and the database connection driver is Org. gjt. mm. mySQL. driver, here mainly discuss the UTF-8 and GBK display (gb2312 is a subset of GBK, Java can use GBK to replace the GB series ). Let's first study the font encoding problem in JSP. The first to sixth points below are for JSP (Because reading Chinese data from the database is different from writing Chinese data, Let's explain it separately, the first three points are from reading the database to displaying on the web page, and the last three points are from inputting data on the Web page to storing data in the database). The seventh to ninth points are class files for pure Java. The following Rs indicates an instance of the resultset, which is a dataset generated after the SELECT statement is executed.

I. database connection using UTF-8

Add this parameter useunicode = true & characterencoding = after the driver connecting to the database.

UTF-8, such as JDBC: mysql: // localhost/dbvf? Autoreconnect = true & useunicode =

True & characterencoding = UTF-8, read from the database Chinese display in the use of gbk jsp page, if the database is stored in the font encoding is UTF-8, use STR = new string (RS. getbytes (1), "UTF-8") or str = Rs. getstring (1), which correctly displays Chinese characters. If the database stores GBK data, you must use STR = new string (Rs. getbytes (1), "GBK") in JSP to display the correct Chinese characters. It is worth noting that if the page uses a UTF-8, the database stores the UTF-8, you can also use STR = new
String (Rs. getbytes (1), "GBK") correctly displays Chinese characters. If the web page is a UTF-8, and the database is stored in GBK, can not directly display Chinese, need 2 step conversion, STR = new string (RS. getbytes (1), "GBK"); then STR = new string (Str. getbytes ("UTF-8"), "GBK"), in order to correctly display Chinese.

Ii. Use GBK for database connection

Add this parameter useunicode = true & characterencoding = after the driver connecting to the database.

GBK, such as JDBC: mysql: // localhost/dbvf? Autoreconnect = true & useunicode = true &

Characterencoding = GBK, read Chinese from the database, displayed in the use of gbk jsp page, if the database is stored in the font encoding is UTF-8, in JSP, you must use STR = new string (RS. getbytes (1), "UTF-8"), in order to correctly display Chinese. If the database stores GBK data, STR = new string (RS. getbytes (1), "GBK") or directly use STR = Rs. getstring (1) to display the correct Chinese characters. If the web page is a UTF-8 and the database stores GBK, you can only use STR = new
String (RS. getstring (1 ). getbytes ("UTF-8"), "GBK") method to display Chinese; if the web page is a UTF-8, and the database stores a UTF-8, available STR = new string (RS. getbytes (1), "GBK") or Rs. the getstring (1) method to display Chinese characters.

Iii. Use the default database connection method

This parameter useunicode = & characterencoding = is not followed by the driver connecting to the database. For example, JDBC: mysql: // localhost/dbvf? Autoreconnect = true, no parameter useunicode = true & characterencoding, indicating to use the default ISO-8895-1 encoding.

1. Read Chinese from the database and display it on the GBK webpage. If the font encoding stored in the database is a UTF-8, you must use the statement STR = new string (RS. getbytes (1), "UTF-8") or str = new string (RS. getstring (1 ). getbytes ("ISO-8859-1"), "UTF-8") to display Chinese characters correctly. If the database stores GBK data, STR = new string (RS. getbytes (1), "GBK") or str = new string (RS. getstring (1 ). getbytes ("ISO-8859-1"), "GBK ")
Display the correct Chinese characters.

2. if the web page is UTF-8 and cannot display GBK directly correctly, two steps are required for conversion, STR = new string (RS. getbytes (1), "GBK"), and then STR = new string (Str. getbytes ("UTF-8"), "GBK") can correctly display Chinese. If the database stores a UTF-8, use STR = new string (RS. getbytes (1), "GBK") or str = new string (RS. getstring (1 ). getbytes ("ISO-8859-1"), "GBK") can display Chinese.

The preceding steps show that the Chinese characters in the database are correctly displayed on the webpage. the following three steps are how to properly store the Chinese characters in the database.

4. database connection using UTF-8 Encoding

In JSP, a process of submitting (submit) is usually used to store the Chinese characters entered on the webpage into the database. Str = request. getparameter ("username"), and then execute the update or insert statement to store the data to the database. How to assign a value to STR is very important, and the Chinese input here is related to the font encoding used by the webpage.

1. Web pages use UTF-8, use STR = new string (request. getparameter ("username "). getbytes ("ISO-8859-1"), "UTF-8") or str = new string (request. getparameter ("username "). getbytes (), "UTF-8"), can make the data stored in the database is UTF-8 encoding.

2. If the webpage uses GBK and STR = new string (request. getparameter ("username"). getbytes (), "GBK"), the UTF-8 code is stored in the database.

3. It is worth noting that the database connection method using UTF-8 cannot store GBK.

5. Use GBK encoding for database connection

1. enter the GBK webpage and save it to the database as GBK: Str = new string (request. getparameter ("username "). getbytes ("ISO-8859-1"), "GBK") or str = new string (request. getparameter ("username "). getbytes (), "GBK ").

2. web page using GBK, want to save the UTF-8 to the database, to be divided into two steps: first STR = new string (request. getparameter ("username "). getbytes (), "GBK"), and then STR = new string (Str. getbytes ("UTF-8"), "GBK.

3. web pages use UTF-8 and use STR = new string (request. getparameter ("username "). getbytes ("ISO-8859-1"), "GBK") or str = new string (request. getparameter ("username "). getbytes (), "UTF-8"), then the data stored in the database is UTF-8 encoding.

4. web pages use UTF-8 and use STR = new string (request. getparameter ("username "). getbytes ("ISO-8859-1"), "UTF-8"), then the data stored in the database is GBK encoding.

6. Use the default database connection mode, that is, the useunicode and characterencoding parameters are not used.

1. the webpage uses GBK. If STR = request. getparameter ("username") or str = new string (request. getparameter ("username "). getbytes (), the data in the database is a GBK code. The web page uses the UTF-8 and uses STR = request. getparameter ("username"), the stored database is UTF-8 encoded.

2. if you use STR = new string (request. getparameter ("username "). getbytes ("ISO-8859-1"), then according to the font Code provided by the web page and save to the database, such as the UTF-8 of the web page, then saved to the database is the UTF-8 code, if you use GBK web page, the word stored in the database is GBK encoding.

3. if you use STR = new string (request. getparameter ("username "). getbytes ("UTF-8"), "UTF-8") This combination can save to the correct data, other data stored in the database is garbled or error code. In the special case of this UTF-8 combination, the web page is using GBK, then stored in the database is GBK, the web page using UTF-8, then stored in the database is the UTF-8.

4. the web page is GBK to save the UTF-8, must need 2 steps: Company = new string (request. getparameter ("company "). getbytes (), "GBK") and Company = new string (company. getbytes ("UTF-8 ")).

5. Web pages are UTF-8, can not save GBK in the database, in a word, change the database connection method can not save GBK code.

All of the above are Data Exchange Based on JSP web pages and databases. The following describes the font encoding conversion in pure Java programming.

7. database connection using UTF-8 Encoding

1. The Chinese Character in the database is UTF-8, can be converted to GBK, but cannot save GBK into the database.

2. The database is GBK, And if converted to a UTF-8, use content = new string (Rs. getbytes (2), "GBK") to store the content directly to the database for UTF-8.

8. Use GBK encoding for database connection

1. the Chinese Character in the database is UTF-8, if converted to GBK, use content = new string (RS. getstring (2 ). getbytes (), "UTF-8"), and then directly insert to the database using the update or insert statement, that is, save GBK. If you use content = new string (RS. getstring (2 ). getbytes (), "GBK") or content = new string (RS. getstring (2 ). getbytes (), and then stored in the database that is stored or UTF-8 encoding.

2. the Chinese Character in the database is GBK, if converted to UTF-8, use content = new string (RS. getstring (2 ). getbytes ("UTF-8") or content = new string (RS. getstring (2 ). getbytes ("UTF-8"), "GBK"), then insert directly to the database using the update or insert statement, that is, the UTF-8 is saved.

3. if a string is GBK, to convert it to a UTF-8, you also use content = new string (gbkstr. getbytes ("UTF-8") or content = new string (gbkstr. getbytes ("UTF-8"), "GBK"); to convert a string to GBK, use new string (utfstr. getbytes ("GBK"), "UTF-8 ").

9. Use the default database connection mode, that is, do not follow the Parameter

1. str2 = new string (gbkstr. getbytes ("UTF-8"), "ISO-8859-1"), which can convert the GBK encoding in the database to a UTF-8.

2. read the UTF-8 and store it in the UTF-8, then use str1 = new string (utfstr. getbytes (), "ISO-8859-1") or str1 = new string (utfstr. getbytes ("GBK"), "ISO-8859-1 ").

3. The UTF-8 in the database cannot be converted to GBK.

If you use the database connection mode of the UTF-8 or the default data connection mode, you cannot convert the UTF-8 to GBK; while the database connection mode of GBK can achieve the mutual conversion of UTF-8 and GBK. We recommend that you use GBK data connection.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.