Solution to the Problem of reading and writing Chinese garbled characters from JSP to MySQL

Source: Internet
Author: User
Chinese garbled characters. this is the biggest headache. I am using the database connection pool, In the DB. the URL in propertise sets the character set to UTF-8. If the characters written to the database are garbled characters on the JSP page, it is also garbled to directly write the Chinese characters into the database and read them again, so the new string (words. getbytes ("ISO-8859-1"), "gb2312"); to read from the database Chinese transcoding, finally out of the cute Chinese characters, but writing is more troublesome ,, because the character sets set by each user's database are not the same. finally, I used words when writing Chinese characters. getbytes ("ISO-8859-1"), in the web. filters in XML set character set to ISO-8859-1

JSP garbled Question and Answer

Generally, when a request is submitted to the server --> the server receives Param from the servlet --> execute the corresponding process (including database operations) ---> the returned response is displayed to the user. If the character set is different, garbled characters may occur.

However, encoding conflicts are mainly caused by differences between the Western European character sets such as iso8859 and GBK, big5, and other Asian character encoding rules. However, for application developers, the dead-end encoding and decoding rules do not make much sense. In comparison, be familiar with the characters/bytes converted during the HTTP Request/response process to further understand the Garbled text. The following are some of the experiences I have summarized based on my actual problems and relevant documents:

    1. The request (parameter) submitted by the client through the form ). The parameter encoding rules here are determined by the directives defined in the HTML or JSP page header, which defaults to the iso-8859-1 if not specifically specified
    2. The server accepts the request. In a servlet (JSP is also a servlet), the server obtains the Request Parameters and performs logical operations. The most common method is request. getparameter. If no parameter is specified, this method actually completes the following work.. get the byte [] B parameter from the request message. decodes (iso-8859-1) a string according to the default encoding rules

    3. Operations on the database may be performed (add, delete, modify, and query). Note that the character set of the database is the same. The methods for viewing character sets of each database are different. Generally, the data dictionary is obtained by viewing the data dictionary.
    4. Return response to the user. The customer's browser displays response based on the charset specified by the contenttype attribute of response.

So why are garbled characters sometimes?

E. g: submit the request to the server from a page specified as gb2312 encoding, while the server segment directly calls the request. if the getparameter () method obtains the string S, the returned string to response will be garbled because the original byte [] is encoded by gb2312, decoding with a iso8859-1 is bound to be garbled. The same is true for storing strings in the data base and retrieving strings from the database. If the database character set is ASCII, then Rs. the string obtained by getstring () is also decoded in ASCII format. If the response is returned directly to the gb2312 page, garbled characters are displayed.

There are usually the following solutions:

    1. Unify all character set encoding. We recommend that you use UTF-8 to create a global filter.
      Filter all requests in the filter.
      Request. setcharacterencoding ("UTF-8") (must be consistent with the character set encoding of the client browser)
      Chain. dofilter (request, response );
      Here, we can solve the encoding conflict issue when the server segment obtains parameters. When request. getparameter () is called after filtering, there will be no encoding conflict in the current environment.
    2. You can modify the server. xml under the corresponding server configuration directory, take Tomcat as an example: uriencoding set to UTF-8, the principle is the same as 1
    3. Use the new string (byte [] Byte, string charset) and byte [] string. getbytes (string charset) methods. However, you must note that the former is decoded (a string is obtained from the byte stream) and the latter is encoded (the string is reencoded as a byte stream)
      For example, if the string S is decoded by iso-8859-1 and you want to store the database whose character set is gb2312, You need to convert string news = new string (S. getbytes ('iso-8859-1 ")," gb2312 ") to ensure that the string with the gb2312 code is obtained, so that character set problems are not caused by the database operation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.