Mysql Character Set and mysql Character Set
Mysql Character Set
Character Set Basic Character Set: the character set in the database contains two meanings character set classification:
- ASCII: Standard American information interchange code; English and other Western European languages; single-byte encoding, 7-bit (bits) represents a character, a total of 128 characters.
- GBK: Expansion specification for Chinese characters, including Chinese characters, Chinese characters, English letters, and numbers, and dubyte encoding. A total of 21003 Chinese characters and GB2312 extensions are included.
- UTF-8: Unicode-standard variable-length character encoding, Unicode-Standard (unified code), industry-standard, including the world's dozens of text systems; UTF-8 uses one to four bytes to encode each character.
- Other common character sets: utf-32, UTF-16, big5 (traditional), latin1 ()
Mysql Character Set: View character set:
mysql> SHOW CHARACTER SET ;+----------+-----------------------------+---------------------+--------+| Charset | Description | Default collation | Maxlen |+----------+-----------------------------+---------------------+--------+| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 || dec8 | DEC West European | dec8_swedish_ci | 1 || cp850 | DOS West European | cp850_general_ci | 1 || hp8 | HP West European | hp8_english_ci | 1 || koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 || latin1 | cp1252 West European | latin1_swedish_ci | 1 || latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 || swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
New Character Set:
When mysql is compiled, -- with-charset = gbk is used to add a one-to-many relationship between character sets and character collation (character ordering rules) charset and collation character sets and character collation, however, a character set must have at least one character Order (collation): collation, collation, and comparison rules. Each character set has multiple collation sets. Different collation determines the accuracy and performance of strings in comparative sorting. View collation
mysql> SHOW COLLATION ;
Mysql follows naming conventions for collation:
- Use _ ci (Case Insensitive)
- Use _ cs (case sensitive)
- Compare with _ bin (encoding value)
Character Set setting level charset and collation setting level: Server-level> database-level> table-level> column-level server-level system variables (can be dynamically set): -- character_set_server: default Internal Operation character set -- character_set_system: System metadata (field, table name, etc.) Character Set settings:
- Use commands to directly set
- Set in the configuration file: mysqld,
Database-level (default character set for storing data in the database)
create database db_name character set latin1 collate latin1_swedish_ci;
-Character_set_database: the default Character Set of the currently selected database mainly affects the default character set of statements such as load data. If the character set of create database is not set, the character_set_server character set is used by default. Table-level
mysql>create table tbl(...) default charset=utf-8 default collate=utf8_bin
Rules for using the data storage character set:
- Use the character set value of the column set
- If the column-level character set does not exist, the default character set value for the corresponding table is used;
- If the table-level character set does not exist, the database-level default character set value is used.
- If the database character set does not exist, use server-level character_set_server to set the value.
View the character set show [global] variables like 'character % 'show [global] variables like 'colation %'
mysql> SHOW VARIABLES LIKE '%CHARACTER%';+--------------------------+----------------------------+| Variable_name | Value |+--------------------------+----------------------------+| character_set_client | utf8 || character_set_connection | utf8 || character_set_database | latin1 || character_set_filesystem | binary || character_set_results | utf8 || character_set_server | latin1 || character_set_system | utf8 || character_sets_dir | /usr/share/mysql/charsets/ |+--------------------------+----------------------------+8 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE '%collation%';+----------------------+-------------------+| Variable_name | Value |+----------------------+-------------------+| collation_connection | utf8_general_ci || collation_database | latin1_swedish_ci || collation_server | latin1_swedish_ci |+----------------------+-------------------+
Modify Character Set modify server-level Character Set
Set global character_set_server = utf8; (global)
Alter table tbl convert to character set XXX; (table)
mysql> alter table stu convert to character set utf8;
Client Connection and Character Set connection and Character Set-character_set_client: character set used by the client source data (what is the encoding of the SQL sent by the client program)-character_set_conection: Connection layer character set (intermediate layer conversion) -character_set_results: the character set of the query result (the character set returned to the client program) can be set in a unified manner (recommended ):
msyql>set names utf8;
It can also be called the connection character set. In the configuration file: default-character-set = utf8 client connection character set
Common Causes of garbled characters:
- The data sent from the client (utf8)-> storage (latin1) cannot be properly encoded by the data storage character set (not supported)
- The character set used by the program connection is inconsistent or incompatible with character_set_client, character_set_conection, and character_set_results that notify mysql. (Tell mysql: set names gbk; the character set used for program connection is consistent)
1. even the table itself utf8, and set names gbk, the program connection also uses gbk; at this time, inserting Chinese characters into the table can still be displayed without garbled characters, because although the program connection uses gbk encoding, mysql converts gbk to utf8 internally for display. 2. table utf8, set names utf8, but the program connection uses gbk. At this time, Chinese characters are inserted into the table, which is garbled. Garbled characters appear in load data:
Aiapple @ ubuntu :~ $ File test. ttest. t: UTF-8 Unicode textaiapple @ ubuntu :~ $ Cat test. t Hello mysql> show variables like '% char % '; + bytes + | Variable_name | Value | + bytes + | character_set_client | utf8 | character_set_connection | utf8 | character_set_database | gbk | bytes | binary | bytes | utf8 | | character_set_server | utf8 | character_set_system | utf8 | character_sets_dir |/usr/share/mysql/charsets/| + ---------------------- + rows + 8 rows in set (0.00 sec) mysql> load data infile '/home/aiapple/test. t' into table t;
In this case, garbled characters are displayed, indicating that the character_set_database Character Set of the database is different from the test. t file used by the program;
You can change set character_set_database = utf8 to display it normally. This is different from the set names xxx, which is the mysql character set. After all, it is used to import the table to
The character set of the database should be the same as the character set of the file. Client Connection and character set usage suggestions:
- The specified character set displayed when you create a database or table. The default character set is not used.
- The connection character set is consistent with the data storage character set. utf8 is recommended.
- Set names xxx)
Three elements: 1. character Set of the program driver or client (set on the client) 2. inform mysql of the character set (set names xxx) 3. the first two character sets (alter table tbl convert to character set xxx) of the data storage must be consistent to avoid garbled characters. We recommend that you set the three character sets to consistent: after the system has been running for a period of time and has some data, it is found that the character set cannot meet the requirements and needs to be modified again. Alter database character set xxx or alter table tablename character set xxx; these two commands only apply to the table or record you want to create: export data first, after appropriate adjustments, re-import can be completed. The following describes how to modify the database of the latin1 character set to the database of the GBK character set. 1) Export the table structure.
mysqldump -uroot -p --default-character-set=gbk -d WY_yun >createtab.sq
-- Default-character-set indicates the character set used to connect to.-d indicates that only the table structure is exported and no data is exported. 2) manually modify creatatab. in SQL, the character set defined in the table structure is the new character set. 3) ensure that records are not updated and all records are exported.
mysqldump -uroot -p --quick --no-create-info --extended-insert --default-character-set=latin1 WY_yun > data.sql
-Quick: This option is used to store large tables. It forces mysqldump to retrieve the rows in the table from the server one row at a time, instead of retrieving all rows, and caches it To the memory before the output. -- No-create-info: Do not write the create table statement -- extended-insert: Use the multiline insert syntax that includes several VALUES lists. In this way, the files to be stored are smaller. When files are reloaded, The -- default-character-set = latin1 can be accelerated. Export all data according to the original character set. In this way, in the exported file, all Chinese characters are visible and will not be saved as garbled characters. 4) modify the data character set to the New Character Set-open data. SQL, change set names latin1 to set names gbk 5) use the new character set to create a new database
mysql> create database na default charset gbk;
6) import the table structure of the new character set to the new database. Create the table and run createtab. SQL.
mysql -uroot -p na <createtab.sql
7) import the data file of the new character set to the new database. Import the data and execute data. SQL.
mysql -uroot -p na < data.sql
It seems that the experiment is not successful. Can I ask a question?
- Character Set and collation are 1: N
- Collation is the ordering and comparison rules of characters and the precision and performance of characters;
- Alter table stu convert to character set utf8;
- Set names utf8;
- Common Causes of garbled characters
- When loading data, the database-level character set should be the same as the file;
- Three elements:
- Character Set of the program driver or client (set on the client)
- Set names xxx)
- Character set for data storage (alter table tbl convert to character set xxx)
- How to save the garbled data after running for a period of time;