MySQL's character set

Source: Internet
Author: User
Tags numeric value



1. Some related concepts about character set

(1) character character, the smallest ideographic (expression of meaning) symbol in human language

(2) Character set (combined) CharSet a set of characters can define a character set, usually including a national, national characters used

ASCII character set, extended ASCII character set, Latin, GB2312 BIG5 Unicode character Set

(3) character encoding assigns a number to a character in a character set to identify

(4) CharSet Character set + encoding = Character Set

(5) character Fu She Collation defines the collation of the characters in the character set (case-sensitive), and with the word Fu She, there is a sorting standard. A character set can have multiple characters Fu She, some characters Fu She may be case-sensitive, and some Fu She may not be case sensitive.


Character Set can be encoded in multiple character sets

a character set can have multiple collations with multiple characters Fu She


The _ci end of the word Fu She collation is case insensitive, and should be case insensitive, with a _cs end that is casing sensitive to _bin Comparison by coded value


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/87/66/wKiom1ffS0_iNG-NAAAzTYYJvQg144.png "title=" Collation.png "alt=" Wkiom1ffs0_ing-naaaztyyjvqg144.png "/>



2. Common character Sets

ASCII Character Set

Extended ASCII Character set latin1 8-bit binary, including all characters in the ASCII character set

GB2312 BIG5 GBK 16-bit binary

Unicode character Set Global language 16-bit binary


650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/66/wKiom1ffTNyg3h-8AAW92GXRcrE665.jpg "title=" ASCII table. jpg "alt=" wkiom1fftnyg3h-8aaw92gxrcre665.jpg "/>

Extended ASCII character set, for example: Latin1, latin2

Unicode character set Unicode encoding one character two bytes

Unicode Character Set UTF-8 encoding an English character one byte, one Chinese character using 3 bytes


UTF-8 is a variable-length byte encoding method. For the UTF-8 encoding of a character, if there is only one byte, its highest binary is 0, and if it is multibyte, its first byte starts from the high, the number of consecutive binary 1 determines the multiple of its encoding, and the remaining bytes begin with 10. The UTF-8 can be up to 6 bytes.

As follows:

1 byte 0xxxxxxx

2 bytes 110xxxxx 10xxxxxx

3 bytes 1110xxxx 10xxxxxx 10xxxxxx

4 bytes 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

5 bytes 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

6 bytes 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx


The essence of encoding is to use a numeric value to represent a character

How to get the code value of a Chinese character in UTF-8 encoding?

For example, a "endure" word, which corresponds to the UTF-8 encoding is e5bf8d

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/87/68/wKiom1ffVUaCYdgrAAAxDW0nos0500.png "title=" 005. PNG "alt=" Wkiom1ffvuacydgraaaxdw0nos0500.png "/>

Converting e5bf8d to binary 111001011011111110001101

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/87/65/wKioL1ffVhrR_RRgAAHR-61qAGE270.gif "title=" 006. GIF "alt=" Wkiol1ffvhrr_rrgaahr-61qage270.gif "/>


Our knowledge Chinese occupies 3 bytes in the UTF-8, so its template is

1110xxxx 10xxxxxx 10xxxxxx

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/87/65/wKioL1ffV1vgK6olAAAR4STq9y4540.png "title=" 007. PNG "alt=" Wkiol1ffv1vgk6olaaar4stq9y4540.png "/>

Therefore, the "endure" word corresponds to the UTF-8 code value of 0101 1111 1100 1101, which 16 binary, exactly two bytes

The 16 binary numbers are then converted to 16 binary numbers, or 5FCD.

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/87/68/wKiom1ffWFLxvpaDAAGURlsrwF0215.gif "title=" 008. GIF "alt=" Wkiom1ffwflxvpadaagurlsrwf0215.gif "/>

Check the encoding query tool again and find that the value of Bigenduni is exactly "5FCD"

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/68/wKiom1ffWJzAtvM3AAAw5sL0itY906.png "title=" 009. PNG "alt=" Wkiom1ffwjzatvm3aaaw5sl0ity906.png "/>




The same characters, using different encoding, the number of bytes occupied is not the same

(1) Unicode encoding is stored in English.

Open Notepad ABCD occupies 10 bytes, "ABCD" altogether 4 characters, each character takes 2 bytes in Unicode encoding, a total of 8 bytes, plus an end character (EOF) of 2 bytes, a total of 10 bytes.

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/87/63/wKioL1ffTnzCjPthAAAeKlsn23A400.png "title=" 001. PNG "alt=" Wkiol1fftnzcjpthaaaeklsn23a400.png "/>

(2) Unicode encoding store Chinese

Notepad input "ABCD Hello", altogether 6 characters, a total of 12 bytes, plus the end character (EOF) 2 bytes, a total of 14 bytes.

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/87/66/wKiom1ffT02ySgXbAAAdWnDvDKg903.png "title=" 002. PNG "alt=" Wkiom1fft02ysgxbaaadwndvdkg903.png "/>


(3) UTF-8 encoded storage in English

In Notepad, enter "ABCD", a total of 4 characters, each English character occupies 1 bytes, a total of 4 bytes, plus the end character (EOF) 3 bytes, a total of 7 bytes.

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/63/wKioL1ffUDCixF4MAAAcXCGmuOk063.png "title=" 003. PNG "alt=" Wkiol1ffudcixf4maaacxcgmuok063.png "/>

(4) UTF-8 encoding store Chinese

Notepad input "ABCD Hello", 4 English characters, each English character occupies 1 bytes, a total of 4 bytes, 2 Chinese characters, each Chinese character occupies 3 bytes, a total of 6 bytes, plus the Terminator (EOF) 3 bytes, a total of 13 bytes.

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/63/wKioL1ffULfzeSuHAAAXCvpizCk274.png "title=" 004. PNG "alt=" Wkiol1ffulfzesuhaaaxcvpizck274.png "/>



To view acceptable character sets in the console

Using CHCP [nnn]

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M00/87/69/wKiom1ffXFfC2mNbAAKwSGZLtW4337.gif "title=" 010. GIF "alt=" Wkiom1ffxffc2mnbaakwsgzltw4337.gif "/>

Displays the number of code pages for the active console, or changes the active console code page for the console. If used without parameters, CHCP displays the number of active console code pages. The syntax chcp [nnn] parameter specifies the code page.           The following table lists all supported code pages and their country/region or language: code page country or language 437 USA 932 Japanese (SHIFT-JIS) 936 China-English (GB2312) 949 Korean 950 Traditional Chinese (BIG5)


Description Windows default character set is GBK




3. The character set in MySQL


3.1. View all the character sets supported by MySQL

Show character set;

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M02/87/69/wKiom1ffXS-yBeF2AABt4SZkFAU024.png "title=" 011. PNG "alt=" Wkiom1ffxs-ybef2aabt4szkfau024.png "/>


3.2. Usage policy for the character set in MySQL: Database level, table level, column level

When creating a database, if you do not specify a character set for the database, the default character set of the server is used

When creating a table, if you do not specify a character set for the table, the database character set is used

When creating a column, if you do not specify a character set, the table's character set is used by default


3.3. View the character set currently in use

Show variables like ' character_set_% ';

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/87/65/wKioL1ffX9ejg2K4AACeudxZnwE337.png "title=" 012. PNG "alt=" Wkiol1ffx9ejg2k4aaceudxznwe337.png "/>


Character_set_system system-level character set

Character_set_server the character set used at the server level

Character_set_database database-level character set (each database may have its own character set)


Character_set_system is the character set used by the server to store identifier (for example: Database name, table name, column name), always UTF8.

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/66/wKioL1ffY67RWAMEAABHOd75OCE447.png "title=" 013. PNG "alt=" Wkiol1ffy67rwameaabhod75oce447.png "/>

Character_set_server, which is the default character encoding for the server. "conjecture: Character_set_system is used to store identifier (identifiers such as: Database name, table name, column name can be in Chinese), and character_set_server is used to indicate content (store contents) Use the character encoding "

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M02/87/69/wKiom1ffZEqDfHJ6AABLn-eKElc665.png "title=" 014. PNG "alt=" Wkiom1ffzeqdfhj6aabln-ekelc665.png "/>

Chracter_set_database, if the database is not currently selected, the value of Character_set_server is used, and if a database is selected, the corresponding character set of the content is stored according to the database.

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/87/6A/wKiom1ffZieAo4ZFAAB1b3KJ1eI069.png "title=" 015. PNG "alt=" Wkiom1ffzieao4zfaab1b3kj1ei069.png "/>


After the content, do not know how to organize, maybe you are not quite clear ...

Character_set_client

Character_set_connection

Character_set_results


Change the MySQL default character set


Create DATABASE Schooldb


Set Names Latin1

Set Names UTF8


Character Set compatibility




MySQL's character set

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.