Character Set base character set: A character set in a database contains two meanings
- A collection of various text and symbols, including national text, punctuation, graphic symbols, numbers, etc.
- The encoding of characters, that is, the mapping rules of binary data and characters;
Character Set Classification:
- ASCII: United States Information Interchange standard coding; English and other Western European languages; single-byte encoding, 7 bits (bits) represents a character, a total of 128 characters.
- GBK: Code expansion code of Chinese characters, Chinese, Japanese and Korean characters, English, numerals, double-byte encoding, a total of 21,003 Chinese characters, GB2312 extension.
- Utf-8:unicode standard variable length character encoding; Unicode standard (Uniform code), industry uniform standards, including the world's dozens of kinds of text system; Utf-8 uses one to 4 bytes for each character encoding.
- Other common character sets: Utf-32,utf-16,big5 (Traditional), latin1 ()
MySQL Character set: view Character set:
Mysql> show Character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default Collation | MaxLen |
+----------+---------------------------------+---------------------+--------+
| Big5 | Big5 Traditional Chinese | Big5_chinese_ci | 2 |
| Dec8 | DEC West European | Dec8_swedish_ci | 1 |
| cp850 | DOS West European | Cp850_general_ci | 1 |
| HP8 | HP West European | Hp8_english_ci | 1 |
| koi8r | Koi8-r relcom Russian | Koi8r_general_ci | 1 |
| Latin1 | cp1252 West European | Latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | Latin2_general_ci | 1 |
| Swe7 | 7bit Swedish | Swe7_swedish_ci | 1 |
| ASCII | US ASCII | Ascii_general_ci | 1 |
| Ujis | EUC-JP Japanese | Ujis_japanese_ci | 3 |
| Sjis | Shift-jis Japanese | Sjis_japanese_ci | 2 |
| Hebrew | ISO 8859-8 Hebrew | Hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | Tis620_thai_ci | 1 |
| Euckr | EUC-KR Korean | Euckr_korean_ci | 2 |
| koi8u | Koi8-u Ukrainian | Koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | Gb2312_chinese_ci | 2 |
| Greek | ISO 8859-7 Greek | Greek_general_ci | 1 |
| cp1250 | Windows Central European | Cp1250_general_ci | 1 |
| GBK | GBK Simplified Chinese | Gbk_chinese_ci | 2 |
| Latin5 | ISO 8859-9 Turkish | Latin5_turkish_ci | 1 |
| Armscii8 | ARMSCII-8 Armenian | Armscii8_general_ci | 1 |
| UTF8 | UTF-8 Unicode | Utf8_general_ci | 3 |
| UCS2 | UCS-2 Unicode | Ucs2_general_ci | 2 |
| cp866 | DOS Russian | Cp866_general_ci | 1 |
| KEYBCS2 | DOS Kamenicky Czech-slovak | Keybcs2_general_ci | 1 |
| Macce | Mac Central European | Macce_general_ci | 1 |
| Macroman | Mac West European | Macroman_general_ci | 1 |
| cp852 | DOS Central European | Cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | Latin7_general_ci | 1 |
| UTF8MB4 | UTF-8 Unicode | Utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | Cp1251_general_ci | 1 |
| UTF16 | UTF-16 Unicode | Utf16_general_ci | 4 |
| Utf16le | Utf-16le Unicode | Utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | Cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | Cp1257_general_ci | 1 |
| Utf32 | UTF-32 Unicode | Utf32_general_ci | 4 |
| binary | Binary Pseudo CharSet | binary | 1 |
| Geostd8 | GEOSTD8 Georgian | Geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | Cp932_japanese_ci | 2 |
| Eucjpms | Ujis for Windows Japanese | Eucjpms_japanese_ci | 3 |
| GB18030 | National Standard GB18030 | Gb18030_chinese_ci | 4 |
+----------+---------------------------------+---------------------+--------+
$ rows in Set (0.00 sec)
New Character Set:
In compiling MySQL with--WITH-CHARSET=GBK to add character set character set and character Fu She (the rules of character sorting) charset and collation character set and Word Fu She is a one-to-many relationship, but a character set has at least a word Fu She collation: Word Fu She, Character sorting and comparison rules, each character set has a corresponding multiple set of characters Fu She. The different character Fu She determine the accuracy and performance of the string in the comparison sort. View Word Fu She
Mysql> SHOW COLLATION;
The MySQL word Fu She follows the naming convention:
- _ci (indicates case insensitivity)
- With _cs (case sensitive)
- In _bin (indicates a comparison with coded values)
Set levels for CharSet settings charset and collation: server level >> database level >> table level >> column-level server-level system variables (can be set dynamically):--character_set_server: Default internal operation Character Set--character_set_system: System Metadata (field, table name, etc.) Character set:
- Set directly with commands
- Settings in the configuration file: Mysqld,
Database level (the default character set for storing data in a database)
Create DATABASE db_name Character Set latin1 collate latin1_swedish_ci;
-character_set_database: The default character set for the currently selected database primarily affects the default character set for statements such as load data; The set of the CREATE database is used by default if not set, character_set_ The character set of the server. Table-Level
Mysql>create table TBL (...) default charset=utf-8 default Collate=utf8_bin
Data storage Character Set usage rules:
- Using the character set setpoint for a column set
- If the column-level character set does not exist, the default character set value of the corresponding table-level is used;
- If the table-level character set does not exist, the database-level default character set value is used
- If the database character set does not exist, use the server-level Character_set_server setting value
PracticeView character set show [global] variables like ' character% ' show [global] variables like ' collation% '
mysql> SHOW VARIABLES like '%character% '; +--------------------------+----------------------------+| variable_name | Value |+--------------------------+----------------------------+| character_set_client | UTF8 | | character_set_connection | UTF8 | | Character_set_database | Latin1 | | Character_set_filesystem | binary | | Character_set_results | UTF8 | | Character_set_server | Latin1 | | Character_set_system | UTF8 | | Character_sets_dir | /usr/share/mysql/charsets/|+--------------------------+----------------------------+8 rows in Set (0.00 sec) MySQL > SHOW VARIABLES like '%collation% '; +----------------------+-------------------+| variable_name | Value |+----------------------+-------------------+| collation_connection | Utf8_general_ci | | Collation_database | LaTin1_swedish_ci | | Collation_server | Latin1_swedish_ci |+----------------------+-------------------+
Modify the character set to modify the server-level character set
Set global Character_set_server=utf8; Global
Modify table-level character set ALTER TABLE TBL Convert to Character set XXX; (table)
mysql> ALTER TABLE Stu convert to character set UTF8;
The client connection is connected to the character set-character_set_client: The character set used by the client source data (the SQL that the client program sent over is encoded with what)-character_set_conection: Connection layer Character set (do middle-tier conversion)-character_set_results: The query result character set (the character set returned to the client program) can generally be set uniformly (recommended):
Msyql>set names UTF8;
can also be unified called the connection character set; In the configuration file: Default-character-set = UTF8 Client connection character set
Common garbled reasons:
- Data store character set is not correctly encoded (not supported) client-sent data: Client (UTF8)->storage (latin1)
- The character set used by the program connection is inconsistent or incompatible with the character_set_client,character_set_conection,character_set_results of the notification MySQL. (Tell mysql:set names GBK; program connection uses the same character set)
1. Even the table itself is UTF8, and the set names GBK, the program connection also uses the GBK, this time inserts the Chinese character in the table, can still display but is not garbled, because although the program connection uses the GBK code, the MySQL internal will convert GBK to display UTF8; 2. Table Utf8,set Names UTF8, but the program connection using GBK, at this time to insert Chinese characters in the table, is garbled. Load data is garbled:
[Email protected]:~$ file test.ttest.t:utf-8 Unicode text[email protected]:~$ cat test.t Hello mysql> show variables like '%char% '; +--------------------------+----------------------------+| Variable_name | Value |+--------------------------+----------------------------+| character_set_client | UTF8 | | character_set_connection | UTF8 | | character_set_database | GBK | | character_set_filesystem | binary | | character_set_results | UTF8 | | Character_set_server | UTF8 | | character_set_system | UTF8 | | character_sets_dir |/usr/ share/mysql/charsets/|+--------------------------+----------------------------+8 rows in Set (0.00 sec) mysql> Load data infile '/home/aiapple/test.t ' into table t;
This will be garbled, that is, inform the database Character_set_database character set and the program used by the file test.t is not the same;
Change set character_set_database = UTF8 to be displayed normally, here with the normal tell MySQL character set set names xxx, after all, is the import table to a
Database, you should make the database's character set the same as the character set of the file client connection and character set usage recommendations:
- The specified character set that is displayed when the database/table is created, without using the default
- The connection character set is consistent with the data store character set, and it is recommended to use UTF8
- Displays the specified character set (set names XXX) when the driver is connected
Client's settings:
Three elements: 1. Program-driven or client-side character set (in client Settings) 2. Notify MySQL of the character set (set names XXX) 3. Data store character set (table structure of character set ALTER TABLE TBL convert to character set XXX) first two Must be consistent to not appear garbled, recommended three are set to a consistent demand: in the system for a period of time, with a certain amount of data, and then found that the character set can not meet the requirements need to re-modify, and do not want to discard the time data. ALTER DATABASE character set XXX or ALTER TABLE tablename character set xxx; These two commands are only valid for the table or record that you want to create: Export the data first, and then re-import it with appropriate adjustments to complete. The following simulates the process of modifying the database of the latin1 character set to a database of the GBK character set 1) to export the table structure
Mysqldump-uroot-p--default-character-set=gbk-d Wy_yun >createtab.sq
--default-character-set that sets the character set to connect,-D means to export only the table structure, do not export data 2) manually modify the character set in the table structure definition in creatatab.sql to the new character Set 3) Ensure that records are not updated, export all records
Mysqldump-uroot-p--quick--no-create-info--extended-insert--default-character-set=latin1 WY_yun > Data.sql
-quick: This option is used to store large tables. It forces mysqldump to retrieve rows from the table one row at a time, rather than retrieving all rows and caching it in memory before output. --no-create-info: Do not write re-create TABLE statement--extended-insert: Use a multiline insert syntax that includes several values lists. This makes the dump file smaller, and when you reload the file, you can accelerate the insertion of--default-character-set=latin1: All data is exported according to the original character set, so that all Chinese is visible in the exported file and will not be stored as garbled 4) Modify the data character set for the new character set---Open data.sql, change set names latin1 to set names GBK 5) Create a new database with the new character set
mysql> CREATE database na default CharSet GBK;
6) Import the table structure of the new character set into the new library; Create a table to execute Createtab.sql
Mysql-uroot-p na <createtab.sql
7) Import data file of new character set into new library; Import data, execute Data.sql
Mysql-uroot-p na < Data.sql
It seems that the experiment has not been successful to ask questions?
Summarize
- Character set and Word Fu She are 1:n
- Character Fu She is the sorting and comparison rules of characters and the precision and performance of characters;
- Table-level character Set modification: ALTER TABLE Stu convert to character set UTF8;
- Three connection related character set unified set: Set names UTF8;
- Common garbled reason
- Load data, because the database-level character set is the same as the file;
- Three elements:
- Program-driven or client-side character set (in client settings)
- Tells the MySQL character set (set names XXX)
- Character set of the data store (table structure of the character set ALTER TABLE TBL convert to character set XXX)
- After running a period of time to find data garbled, how to save data;
MySQL character set or word Fu She