1. Some related concepts about character set
(1) character character, the smallest ideographic (expression of meaning) symbol in human language
(2) Character set (combined) CharSet a set of characters can define a character set, usually including a national, national characters used
ASCII character set, extended ASCII character set, Latin, GB2312 BIG5 Unicode character Set
(3) character encoding assigns a number to a character in a character set to identify
(4) CharSet Character set + encoding = Character Set
(5) character Fu She Collation defines the collation of the characters in the character set (case-sensitive), and with the word Fu She, there is a sorting standard. A character set can have multiple characters Fu She, some characters Fu She may be case-sensitive, and some Fu She may not be case sensitive.
Character Set can be encoded in multiple character sets
a character set can have multiple collations with multiple characters Fu She
The _ci end of the word Fu She collation is case insensitive, and should be case insensitive, with a _cs end that is casing sensitive to _bin Comparison by coded value
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/87/66/wKiom1ffS0_iNG-NAAAzTYYJvQg144.png "title=" Collation.png "alt=" Wkiom1ffs0_ing-naaaztyyjvqg144.png "/>
2. Common character Sets
ASCII Character Set
Extended ASCII Character set latin1 8-bit binary, including all characters in the ASCII character set
GB2312 BIG5 GBK 16-bit binary
Unicode character Set Global language 16-bit binary
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/66/wKiom1ffTNyg3h-8AAW92GXRcrE665.jpg "title=" ASCII table. jpg "alt=" wkiom1fftnyg3h-8aaw92gxrcre665.jpg "/>
Extended ASCII character set, for example: Latin1, latin2
Unicode character set Unicode encoding one character two bytes
Unicode Character Set UTF-8 encoding an English character one byte, one Chinese character using 3 bytes
UTF-8 is a variable-length byte encoding method. For the UTF-8 encoding of a character, if there is only one byte, its highest binary is 0, and if it is multibyte, its first byte starts from the high, the number of consecutive binary 1 determines the multiple of its encoding, and the remaining bytes begin with 10. The UTF-8 can be up to 6 bytes.
As follows:
1 byte 0xxxxxxx
2 bytes 110xxxxx 10xxxxxx
3 bytes 1110xxxx 10xxxxxx 10xxxxxx
4 bytes 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
5 bytes 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
6 bytes 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
The essence of encoding is to use a numeric value to represent a character
How to get the code value of a Chinese character in UTF-8 encoding?
For example, a "endure" word, which corresponds to the UTF-8 encoding is e5bf8d
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/87/68/wKiom1ffVUaCYdgrAAAxDW0nos0500.png "title=" 005. PNG "alt=" Wkiom1ffvuacydgraaaxdw0nos0500.png "/>
Converting e5bf8d to binary 111001011011111110001101
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/87/65/wKioL1ffVhrR_RRgAAHR-61qAGE270.gif "title=" 006. GIF "alt=" Wkiol1ffvhrr_rrgaahr-61qage270.gif "/>
Our knowledge Chinese occupies 3 bytes in the UTF-8, so its template is
1110xxxx 10xxxxxx 10xxxxxx
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/87/65/wKioL1ffV1vgK6olAAAR4STq9y4540.png "title=" 007. PNG "alt=" Wkiol1ffv1vgk6olaaar4stq9y4540.png "/>
Therefore, the "endure" word corresponds to the UTF-8 code value of 0101 1111 1100 1101, which 16 binary, exactly two bytes
The 16 binary numbers are then converted to 16 binary numbers, or 5FCD.
650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/87/68/wKiom1ffWFLxvpaDAAGURlsrwF0215.gif "title=" 008. GIF "alt=" Wkiom1ffwflxvpadaagurlsrwf0215.gif "/>
Check the encoding query tool again and find that the value of Bigenduni is exactly "5FCD"
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/68/wKiom1ffWJzAtvM3AAAw5sL0itY906.png "title=" 009. PNG "alt=" Wkiom1ffwjzatvm3aaaw5sl0ity906.png "/>
The same characters, using different encoding, the number of bytes occupied is not the same
(1) Unicode encoding is stored in English.
Open Notepad ABCD occupies 10 bytes, "ABCD" altogether 4 characters, each character takes 2 bytes in Unicode encoding, a total of 8 bytes, plus an end character (EOF) of 2 bytes, a total of 10 bytes.
650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/87/63/wKioL1ffTnzCjPthAAAeKlsn23A400.png "title=" 001. PNG "alt=" Wkiol1fftnzcjpthaaaeklsn23a400.png "/>
(2) Unicode encoding store Chinese
Notepad input "ABCD Hello", altogether 6 characters, a total of 12 bytes, plus the end character (EOF) 2 bytes, a total of 14 bytes.
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/66/wKiom1ffT02ySgXbAAAdWnDvDKg903.png "title=" 002. PNG "alt=" Wkiom1fft02ysgxbaaadwndvdkg903.png "/>
(3) UTF-8 encoded storage in English
In Notepad, enter "ABCD", a total of 4 characters, each English character occupies 1 bytes, a total of 4 bytes, plus the end character (EOF) 3 bytes, a total of 7 bytes.
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/63/wKioL1ffUDCixF4MAAAcXCGmuOk063.png "title=" 003. PNG "alt=" Wkiol1ffudcixf4maaacxcgmuok063.png "/>
(4) UTF-8 encoding store Chinese
Notepad input "ABCD Hello", 4 English characters, each English character occupies 1 bytes, a total of 4 bytes, 2 Chinese characters, each Chinese character occupies 3 bytes, a total of 6 bytes, plus the Terminator (EOF) 3 bytes, a total of 13 bytes.
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/63/wKioL1ffULfzeSuHAAAXCvpizCk274.png "title=" 004. PNG "alt=" Wkiol1ffulfzesuhaaaxcvpizck274.png "/>
To view acceptable character sets in the console
Using CHCP [nnn]
650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M00/87/69/wKiom1ffXFfC2mNbAAKwSGZLtW4337.gif "title=" 010. GIF "alt=" Wkiom1ffxffc2mnbaakwsgzltw4337.gif "/>
Displays the number of code pages for the active console, or changes the active console code page for the console. If used without parameters, CHCP displays the number of active console code pages. The syntax chcp [nnn] parameter specifies the code page. The following table lists all supported code pages and their country/region or language: code page country or language 437 USA 932 Japanese (SHIFT-JIS) 936 China-English (GB2312) 949 Korean 950 Traditional Chinese (BIG5)
Description Windows default character set is GBK
3. The character set in MySQL
3.1. View all the character sets supported by MySQL
Show character set;
650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M02/87/69/wKiom1ffXS-yBeF2AABt4SZkFAU024.png "title=" 011. PNG "alt=" Wkiom1ffxs-ybef2aabt4szkfau024.png "/>
3.2. Usage policy for the character set in MySQL: Database level, table level, column level
When creating a database, if you do not specify a character set for the database, the default character set of the server is used
When creating a table, if you do not specify a character set for the table, the database character set is used
When creating a column, if you do not specify a character set, the table's character set is used by default
3.3. View the character set currently in use
Show variables like ' character_set_% ';
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/87/65/wKioL1ffX9ejg2K4AACeudxZnwE337.png "title=" 012. PNG "alt=" Wkiol1ffx9ejg2k4aaceudxznwe337.png "/>
Character_set_system system-level character set
Character_set_server the character set used at the server level
Character_set_database database-level character set (each database may have its own character set)
Character_set_system is the character set used by the server to store identifier (for example: Database name, table name, column name), always UTF8.
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/66/wKioL1ffY67RWAMEAABHOd75OCE447.png "title=" 013. PNG "alt=" Wkiol1ffy67rwameaabhod75oce447.png "/>
Character_set_server, which is the default character encoding for the server. "conjecture: Character_set_system is used to store identifier (identifiers such as: Database name, table name, column name can be in Chinese), and character_set_server is used to indicate content (store contents) Use the character encoding "
650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M02/87/69/wKiom1ffZEqDfHJ6AABLn-eKElc665.png "title=" 014. PNG "alt=" Wkiom1ffzeqdfhj6aabln-ekelc665.png "/>
Chracter_set_database, if the database is not currently selected, the value of Character_set_server is used, and if a database is selected, the corresponding character set of the content is stored according to the database.
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/6A/wKiom1ffZieAo4ZFAAB1b3KJ1eI069.png "title=" 015. PNG "alt=" Wkiom1ffzieao4zfaab1b3kj1ei069.png "/>
After the content, do not know how to organize, maybe you are not quite clear ...
Character_set_client
Character_set_connection
Character_set_results
Change the MySQL default character set
Create DATABASE Schooldb
Set Names Latin1
Set Names UTF8
Character Set compatibility
MySQL's character set