[Mysql] adds support for emoji and mysql adds emoji

Source: Internet
Author: User

[Mysql] adds support for emoji and mysql adds emoji
1. Introduction

We recommend that you use the utf8mb4 character set in advance for MySQL databases related to wireless connection to avoid problems caused by emoji.

MySQL Server> 5.5.3

2. Configuration + upgrade the current configuration
mysql> \s--------------mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1Connection id:          3Current database:Current user:           root@localhostSSL:                    Not in useCurrent pager:          stdoutUsing outfile:          ''Using delimiter:        ;Server version:         5.6.25-debug-log Source distributionProtocol version:       10Connection:             Localhost via UNIX socketServer characterset:    utf8mb4Db     characterset:    utf8mb4Client characterset:    utf8mb4Conn.  characterset:    utf8mb4UNIX socket:            /data/mysql/mysql.sockUptime:                 46 min 53 sec
Change Database/table
ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;ALTER TABLE tbname CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;ALTER TABLE tbname CHANGE old_column_name new_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
My. cnf Configuration
[client]  default-character-set = utf8mb4   [mysql]  default-character-set = utf8mb4   [mysqld]  character-set-client-handshake = FALSE  character-set-server = utf8mb4  collation-server = utf8mb4_general_ci  init_connect='SET NAMES utf8mb4'
/Usr/share/mysql/charsets/Index. xml configuration
<charset name="utf8mb4">  <family>Unicode</family>  <description>UTF-8 Unicode</description>  <alias>utf-8</alias>  <collation name="utf8_general_ci"     id="33">   <flag>primary</flag>   <flag>compiled</flag>  </collation>  <collation name="utf8_bin"            id="83">    <flag>binary</flag>    <flag>compiled</flag>  </collation></charset>
View Character Set
mysql> show variables where variable_name like 'character_set%' or variable_name like 'coll%';+--------------------------+----------------------------------+| Variable_name            | Value                            |+--------------------------+----------------------------------+| character_set_client     | utf8                             || character_set_connection | utf8                             || character_set_database   | utf8                             || character_set_filesystem | binary                           || character_set_results    | utf8                             || character_set_server     | utf8                             || character_set_system     | utf8                             || character_sets_dir       | /u01/mysql/share/mysql/charsets/ || collation_connection     | utf8_general_ci                  || collation_database       | utf8_general_ci                  || collation_server         | utf8_general_ci                  |+--------------------------+----------------------------------+11 rows in set (0.00 sec)mysql> show char set;+----------+-----------------------------+---------------------+--------+| Charset  | Description                 | Default collation   | Maxlen |+----------+-----------------------------+---------------------+--------+| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 || dec8     | DEC West European           | dec8_swedish_ci     |      1 || cp850    | DOS West European           | cp850_general_ci    |      1 || hp8      | HP West European            | hp8_english_ci      |      1 || koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 || latin1   | cp1252 West European        | latin1_swedish_ci   |      1 || latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 || swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 || ascii    | US ASCII                    | ascii_general_ci    |      1 || ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 || sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 || hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 || tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 || euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 || koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 || gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 || greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 || eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |+----------+-----------------------------+---------------------+--------+40 rows in set (0.00 sec)mysql> show COLLATION;+--------------------------+----------+-----+---------+----------+---------+| Collation                | Charset  | Id  | Default | Compiled | Sortlen |+--------------------------+----------+-----+---------+----------+---------+| big5_chinese_ci          | big5     |   1 | Yes     | Yes      |       1 || big5_bin                 | big5     |  84 |         | Yes      |       1 || dec8_swedish_ci          | dec8     |   3 | Yes     | Yes      |       1 || dec8_bin                 | dec8     |  69 |         | Yes      |       1 || cp850_general_ci         | cp850    |   4 | Yes     | Yes      |       1 || eucjpms_bin              | eucjpms  |  98 |         | Yes      |       1 |+--------------------------+----------+-----+---------+----------+---------+219 rows in set (0.01 sec)

-Character_set_server: Default internal operation Character Set

-Character_set_client: Character set used by client source data

-Character_set_connection: Connection layer Character Set

-Character_set_results: Query Result Character Set

-Character_set_database: Default Character Set of the currently selected Database

-Character_set_system: System metadata (field name, etc.) Character Set

The name of the collation in MySQL follows naming conventions.

Starts with the character set name corresponding to the collation; starts with _ ci (case Insensitive case ignore), _ cs (case sensitive), or _ bin (compare by encoding value) end.

For example, in the collation "utf8_general_ci", "a" and "A" are equivalent.

Character Set conversion process in MySQL

1. MySQL Server converts the request data from character_set_client to character_set_connection when receiving the request;

2. Before performing internal operations, convert the request data from character_set_connection to the internal operation character set. The method is as follows:

• SET the character set value for each data field;

• If the preceding value does not exist, use the default character set Value of the corresponding data table (MySQL extension, non-SQL standard );

• If the preceding value does not exist, use the default character set Value of the corresponding database;

• If the preceding value does not exist, use character_set_server to set the value.

3. Convert the operation result from the internal character set to character_set_results.

3. emoji and utf8mb4

Emoji: mysql utf8 is not supported. You need to change it to utf8mb4.

Before MYSQL 5.5, UTF8 encoding only supports 1-3 bytes, only support BMP unicode encoding area, about BMP, http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters here, basically is 0000 ~ FFFF.

MySQL and later versions support utf8mb4 UTF-8 encoded 4 bytes. A single character can contain up to 4 bytes. Therefore, more character sets are supported. utf8mb4 is compatible with utf8, more characters can be expressed than utf8

Create table 'ios _ emoji '('id' int (11) not null AUTO_INCREMENT COMMENT 'auto-incrementing id', 'unicode' varchar (100) character set utf8mb4 COLLATE encode default null comment 'unicode encoded ', 'utf8' varchar (100) character set utf8mb4 COLLATE encode default null comment 'utf8encoded', 'utf16' varchar (100) character set utf8mb4 COLLATE encode default null comment 'utf16 Code', 'sbunicode 'varchar (100) character set utf8mb4 COLLATE encode default null comment 'sbunicode Code', 'filename' varchar (100) character set utf8mb4 COLLATE utf8mb4_unicode_ci default null comment 'file name', 'filebyte 'longblob comment' file content byte ', primary key ('id ')) ENGINE = InnoDB default charset = utf8 ROW_FORMAT = compact comment = 'ios emoticons encoding table ';

  

References

Http://drupal.stackexchange.com/questions/166405/why-are-we-using-utf8mb4-general-ci-and-not-utf8mb4-unicode-ci

Http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

Http://blog.csdn.net/leshami/article/details/42024217

Http://www.laruence.com/2008/01/05/12.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.