3.1.2 String Value
3.1.2.1 String type and Character set support
3.1.2.2 Character set-related system variables
3.1.2.1 String type and Character set support
String values can generally be divided into two categories, binary and non-binary strings
二进制串:一组字节序列,没有特殊的比较或者排序属性. 比较操作是基于各字节的数值逐个字节实现的. 所有字节都有意义,**甚至包括结尾的空格**. 非二进制串:一个字符序列,每个二进制串都与字符集相关. 字符集决定了:MySQL如何解释字符串内容,哪些字符可以用. 每个字符集都有一种或者多种排序规则. **尾部空格不会参与比较(TEXT类型除外-具有唯一性索引)**
The collation used by the string determines the order in which the characters are in the character set, which affects the comparison operation.
The default character set and collation are Latin1 and Latin_swedish_ci, respectively.
To view the character set and collation provided on the server:
Show character set;+----------+-----------------------------+---------------------+--------+|Charset|Description|DefaultCollation |MaxLen|+----------+-----------------------------+---------------------+--------+| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2|| dec8 | DEC West European | dec8_swedish_ci | 1|| cp850 | DOS West European | cp850_general_ci | 1|| hp8 | HP West European | hp8_english_ci | 1|| koi8r | KOI8-R relcom Russian | koi8r_general_ci | 1|..............show collation,--------------------------+----------+-----+---------+----------+---------+|Collation|Charset|Id|Default|Compiled|Sortlen|+--------------------------+----------+-----+---------+----------+---------+| big5_chinese_ci | Big5 | 1|Yes|Yes|1|| big5_bin | Big5 | -| |Yes|1|| dec8_swedish_ci | Dec8 | 3|Yes|Yes|1|| dec8_bin | Dec8 | the| |Yes|1|| cp850_general_ci | cp850 | 4|Yes|Yes|1|| cp850_bin | cp850 | the| |Yes|1|................
Each collation is bundled on a specific character set, and each given character set can have multiple collations.
Format: Character set name language name append suffix
The suffix rules are as follows:
_ci indicates that collations are case-insensitive
_cs indicates that collations are case sensitive
_bin indicates that this is a binary sort rule. (comparison operations are based on numeric character encoding values, language independent)
such as: Utf8_bin
Ordering characteristics of binary and non-binary strings:
Two: Byte-by-bit comparisons, the result depends only on the numeric size of each byte, case-sensitive (case-insensitive, corresponding
Byte values are different, the binary string is actually not case-sensitive concept case sensitivity is actually a function of collation )
Non-two: by character, the relative value of each character depends on the collation of the character set currently in use. Case is set to the same sort value, so case insensitive (not for case-insensitive non-binary ordering )
Determine the character set and collation of a string:
(by default, MySQL treats hexadecimal constants as binary strings)
select charset(x‘0123‘),collation(x‘0123‘);+------------------+--------------------+| charset(x‘0123‘) | collation(x‘0123‘) |+------------------+--------------------+| binary | binary |+------------------+--------------------+
There are two notation conventions that you can use to force a string to be interpreted as a specified character set.
1._charset Str
‘abc‘_latin2 x‘616263‘0‘def‘_utf8 X‘646566‘0x646566
对于引号里的字符串: 字符集引导符与字符串之间空白可选 十六进制不能留有任何空白
2.N ' str ' (equivalent to _utf8 ' str ')
n must be followed by a string in quotation marks, and cannot have any whitespace
(3. Guide notation for string expressions or column values)
using charset);
The bootstrapper and convert are not the same, the guide will only change the interpretation of the string, will not change the value, and convert is a function, enter the input parameters, generate a new string to return.
_ucs2 ‘ABCD‘;set @s2 = convert (‘ABCD‘ using ucs2);select char_length(@s1), length(@s1), char_length(@s2), length(@s2);+------------------+-------------+------------------+-------------+| char_length(@s1) | length(@s1) | char_length(@s2) | length(@s2) |+------------------+-------------+------------------+-------------+| 2 | 4 | 4 | 8 |+------------------+-------------+------------------+-------------+
First statement: To interpret each pair of ABCD characters as a double-byte ucs2 character
The second statement: converts each character to the corresponding UCS2 character.
This section concludes with a description of the difference between binary strings and non-binary strings that use binary string collations .
1. The binary string does not have the concept of a character set. It will be interpreted as a byte, and the comparison is a single-byte numeric code
2. Non-binary strings that use binary collations are interpreted as characters and are compared to their numeric character values, which are usually calculated based on multiple bytes per character.
set ‘abcd‘;set _latin1‘abcd‘ collate latin1_bin; select upper(@s1),upper(@s2);+------------+------------+| upper(@s1) | upper(@s2) |+------------+------------+| abcd | ABCD |+------------+------------+
Binary strings do not have the concept of a character set at all, so it is not possible to know which byte values correspond to uppercase or lowercase characters. If you want to do this, you can choose convert before using the upper lower function.
3.1.2.2 Character set-related system variables
这一小节pass 暂时并没有什么用 一切使用默认就好了
Study with me MySQL Technical Insider (fifth Edition): (chapter III Study diary 2)