MySQL garbled characters and utf8mb4 character set

Source: Internet
Author: User
Tags character set php and

Garbled

We recommend that you take a closer look at MySQL character set settings to distinguish between client and server encoding. The simplest and most violent method is to explicitly specify the same encoding in all links.

For example, when using python MySQLdb to connect to MySQL, the default charset is latin1. You must specify charset = 'utf8 & prime ;, even if the init-connect = 'set NAMES utf8 & prime; on the server side, MySQLdb overwrites this option with latin1. Refer to this article;

Emoji and utf8mb4

For emoji expressions, mysql utf8 is not supported. You need to set it to utf8mb4 to support it. The relationship between emoji expressions and utf8mb4 is detailed.

MYSQL 5.5 before, UTF8 encoding only supports 1-3 bytes, only support BMP this part of the unicode encoding area, BMP from where to where, to the http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters here, basically is 0000 ~ FFFF. From MYSQL5.5, you can support four UTF-8 encoded UTF-8 MB4 bytes. A single character can contain up to four bytes. Therefore, you can support more character sets.

Utf8mb4 is a superset of utf8

Utf8mb4 is compatible with utf8 and can represent more characters than utf8.

Modification method

Server

Modify the database configuration file/etc/my. cnf

Character-set-server = utf8mb4
Collation_server = utf8mb4_unicode_ci

Restart MySQL (according to the official documentation, both options can be set dynamically, but the actual experience is that the Server must be restarted)

Encode an existing table as utf8mb4

ALTER TABLE
Tbl_name
CONVERT TO CHARACTER SET
Charset_name

The following statement modifies the default encoding of the table.

Alter table etape_prospection CHARSET = utf8;

Client

The jdbc connection string does not support utf8mb4. This method is used to solve this problem. If character_set_server = utf8mb4 is set on the server side, the client automatically regards the passed UTF-8 as utf8mb4.

Connector/J did not support utf8mb4 for servers 5.5.2 and newer. connector/J now auto-detects servers configured withcharacter_set_server = utf8mb4 or treats the Java encoding UTF-8 passed using characterEncoding =... as utf8mb4 in the set names = callit makes when establishing the connection. (Bug #54175)

Other clients, such as php and python, need to check whether the client supports them. If they cannot be specified in the connection string, run "set names utf8mb4 & Prime;" after obtaining the connection; to solve this problem;

Because utf8mb4 is the superset of utf8, theoretically, even if the client modifies the character set to utf8mb4, it will not cause any problems in reading the existing utf8 encoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.