When using MySQL to create data tables, it is unavoidable to involve the concept of character set and collation, which is not well understood before.
The two days are not very busy, and I tidied up a bit.
Let's take a look at character set and collation.
&. Character set, which is the character set.
We often see the utf-8, GB2312, GB18030 are independent of each other character set. That is, a set of encodings for Unicode.
So how to understand the difference between Unicode and Utf-8, GB2312?
For example, there is an apple in front of you, called Apple in English, and Apple in Chinese.
The concept of the Apple entity is Unicode, and utf-8, GB2312 can be thought of as different languages for Apple, in essence, is to describe Apple this thing.
&. Collation, that is, the method of comparison.
Used to specify how the dataset is sorted, and the alignment rules for strings. (This may be more abstract and will be explained in more detail later.) )
The relationship between character set and collation
Software internationalization is the trend, so Unicode is the best choice for internationalization. Of course to improve performance, in some cases it is better to use latin1.
MySQL has two Unicode-enabled character set:
1. UCS2: Use the bits to represent a Unicode character.
2. UTF8: Use the bytes to represent a Unicode character.
Select which character set depends on the case, such as UTF8 means that the Latin character requires only one byte, so when the user data is mostly English and other Latin characters, use UTF8 to save the database storage space. It is said that SQL Server is using UCS2, which I doubt.
Each character set corresponds to a certain number of collation. The view method is to enter it under the console of MySQL:
Java code
- Mysql> Show collation;
We will see the result:
The rules of collation names can be summed up in these two categories:
1. <character set>_<language/other>_<ci/cs>
2. <character Set>_bin
For example:
Utf8_danish_ci
CI is the abbreviation for case insensitive, and CS is the abbreviation for case sensitive. That is, specifies whether the case is sensitive.
It is strange that the UTF8 character set corresponds to the collation actually no one is CS.
So what's the difference between Utf8_general_ci, Utf8_unicode_ci, utf8_danish_ci? What is the meaning of their respective existence?
The difference between the different collation of the same character set is the accuracy of the sort, word Fu Chun contrast (the same two characters may be different in different countries ' languages) and performance.
For example:
Utf8_general_ci in the accuracy of the ranking is inferior to utf8_unicode_ci, of course, for English users should be no different. But the performance (sorting and alignment speed) is slightly better than utf8_unicode_ci. For example, the former does not speak to German
? = SS
The support.
In contrast to Utf8_unicode_ci, Utf8_danish_ci added special sort support for Danish.
Add:
1. When the character set of a table is latin1, the character set of the field automatically changes to UTF8 if the field type is nvarchar.
The database character set, table character set, field character set can be covered progressively.
2. Under CI collation, how to match the case when compared:
Wrote mysql> select * from pet;
+----------+-------+---------+------+------------+-------+
| name | Owner | Species | sex | Birth | Death |
+----------+-------+---------+------+------------+-------+
| Whistler | Gwen | Bird | NULL | 1997-12-09 | NULL |
| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |
+----------+-------+---------+------+------------+-------+
2 rows in Set (0.00 sec)
Mysql> SELECT * from pet where name = ' Whistler ';
+----------+-------+---------+------+------------+-------+
| name | Owner | Species | sex | Birth | Death |
+----------+-------+---------+------+------------+-------+
| Whistler | Gwen | Bird | NULL | 1997-12-09 | NULL |
| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |
+----------+-------+---------+------+------------+-------+
2 rows in Set (0.00 sec)
Mysql> SELECT * from pet where binary name = ' Whistler ';
+----------+-------+---------+------+------------+-------+
| name | Owner | Species | sex | Birth | Death |
+----------+-------+---------+------+------------+-------+
| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |
+----------+-------+---------+------+------------+-------+
1 row in Set (0.00 sec)
Mysql> SELECT * from pet where name = binary ' Whistler ';
+----------+-------+---------+------+------------+-------+
| name | Owner | Species | sex | Birth | Death |
+----------+-------+---------+------+------------+-------+
| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |
+----------+-------+---------+------+------------+-------+
1 row in Set (0.00 sec)
Recommended Use
Mysql> SELECT * from pet where name = binary ' Whistler ';
This ensures that the current field's index is still valid and
Mysql> SELECT * from pet where binary name = ' Whistler ';
Will invalidate the index.
-----------------------------------------------------------------------------------------------------------
CharSet and collation have multiple levels of settings: server-level, database-level, table-level, column-level, and connection-level www.2cto.com 1. Server-level view settings: Show global Variables like ' character_set_server '; And show global variables like ' collation_server '; Modify settings: Set: in Option FILE (/ETC/MYSQL/MY.CNF) [mysqld] character_set_server=utf8 COLLATION_SERVER=UTF8_GENERAL_CI 2. Database level View settings: SELECT * from information_schema.schemata where schema_name = ' Cookbook '; Set: 1. If not explicitly set, the server-level configuration 2 is used automatically. Explicit settings: Specify when creating a library create database PlayUtf8 default CHARACTER SET latin1 COLLATE latin1_swedish_ci; ww w.2cto.com 3. Table level View settings: Show create table course; Settings: 1. If not explicitly set, automatically use database-level configuration 2. Explicit settings: Specify create table UTF when creating tables (ID int ) Default Charset=utF8 default collate=utf8_bin; 4. Column level View settings: Show create TABLE course; settings: 1. Automatically use table-level configuration &NBSP;2 if not explicitly set. Explicit Settings: CREATE TABLE Table1 (Column1 VARCHAR (5) CHARACTER SET latin1 COLLATE latin1_german1_ci); 5. Connection Level View Settings: show variables like ' character_set_client '; The # server uses this code to understand the statements show variables like ' character_set_connection ' sent by the client; # I don't know what it means, wait to see the MySQL source show variables like ' character_set_results '; # The server uses this encoding to echo the result set and error messages settings: The client can specify these parameters when connecting, and the server also provides a global range of values when the client does not specify these parameters. This global value is used by the server. How is this global value set? I've looked through a lot of documents and it seems like I haven't seen the setup yet (some say it's wrong to specify command-line arguments by MY.CNF, or when starting mysqld) : What code is used to connector/j transfer SQL? Answer: "The character encoding between client and server is automatically detected upon connection. The encoding used byThe driver is specified on the server using the CHARACTER_SET_SERVER system variable for server versions 4.1.0 and newer. " That is, the Character_set_server value is queried on the server side when connecting, and then the encoding used by the connection is determined. However, the official document also says, "to override the auto-detect encoding feature on the client, you can use the" characterencoding "attribute in the URL used to connect to the server. "
Character set and collation in MySQL