A problem with Unicode emoticons inserted into the database, incorrect string value: ' \xf0\x9f\x98\x84\xf0\x9f

Source: Internet
Author: User
Tags mysql version

Problem description: From Sina Weibo crawl message saved to MySQL data, corresponding database field is varchar, character encoding utf-8. Partial insert succeeded, partial insert failed, error such as title.

In the online query, some people say that is the coding problem, suggest to modify the encoding format, such as change to Gbk,utf-8,blob, etc., but few people give a more detailed answer. In an English web site, only to find the cause of the real error. Link 1 Link 2

Error Reason: we can see the character 0xf0 0x9F 0x98 0x84 in the error prompt, which corresponds to the 4-byte encoding in the UTF-8 encoding format (UTF-8 encoding specification). Normal Chinese characters generally don't exceed 3 bytes, why do they appear 4 bytes? In fact, it corresponds to the expression in the smart phone input method. Then why did you make an error? Because utf-8 in MySQL is not a true utf-8, it can only store utf-8 encoding of the length of a byte, if you want to store 4 bytes in a utf8mb4 type. Instead of using the UTF8MB4 type, first make sure the MySQL version is either lower than the MySQL 5.5.3.

Solution:

1) using the UTF8MB4 data type

To use this strategy, if the MySQL version is less than 5.5.3, the version upgrade is done first, and then the corresponding data type is changed to the UTF8MB4 type. If you are using a connector/j connection database, you need to change the encoding format to UTF8MB4 (set character_set_server=utf8mb4 in the Connection config) in the configuration.

2) Custom filter rules that filter or convert the four-byte UTF-8 characters that appear in the text to a custom type.

The following is an example of a test that translates 4-byte characters to 0000.

[Java]View PlainCopy
  1. for (int i = 0; i < b_text.length; i++)
  2. {
  3. if ((B_text[i] & 0xF8) = = 0xF0) {
  4. For (int j = 0; j < 4; j + +) {
  5. b_text[i+j]=0x30;
  6. }
  7. i+=3;
  8. }
  9. }

Unicode Emoticons Insert database problem, incorrect string value: ' \xf0\x9f\x98\x84\xf0\x9f

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.