Thoughts on MySQL character set architecture

Source: Internet
Author: User
In recent months, every time I use MySQL, I almost always think: is the role of such a hierarchical character set architecture of MySQL really great? MySQL character set processing send request client (character_set_client), database connection (character_set_connection), storage (table, column), return request storage (table, column)

In recent months, every time I use MySQL, I almost always think: is the role of such a hierarchical character set architecture of MySQL really great? MySQL character set processing send request client (character_set_client) = "database connection (character_set_connection) =" storage (table, column) return request storage (table, column) ="

In recent months, every time I use MySQL, I almost always think: is the role of such a hierarchical character set architecture of MySQL really great?

MySQL Character Set Processing

Send request

Client (character_set_client) = database connection (character_set_connection) = storage (table, column)

Response Request

Storage (table, column) = "database connection (character_set_connection) =" client (character_set_results)

Character Set conversion from the previous node to the current node is performed on each non-initial node. For example, the following environment is available:

◆ Character_set_connection UTF-8

◆ Character_set_results gbk

◆ Character_set_client gb2312

◆ Table A is available, and the field character set is BIG5

When sending a request, the data is first converted from gbk to UTF-8, then converted to BIG5, and then stored.

When a request is returned, the data is first converted from BIG5 to UTF-8, then to gb2312, and then sent to the client.

Role of Architecture

1. allow different clients to have different character sets. A typical example is that I have a UTF-8 site, which is a client with the charset client being UTF-8. At the same time, I may need to read and write databases on a gbk terminal. This is another client, but its character set is gbk.

2. When operating the file system through the database, you need to convert the file path to the character set of the file system. For example, my client is gbk, And the Server File System is UTF-8. Operation "/A/Rina. rmvb". In the data sent in the past, the data of the "piece" is different from that of the server. At this time, we need a way to convert GBK "slices" to UTF-8. MySQL introduces a character named character_filesystem to complete this task.

In addition, I cannot think of other functions for the moment. But think about it, do we really need to handle it like this? Many websites simply want their data to come in. There are two more cases.

1. Sort data or perform like operations. First, sorting. For fields that contain Chinese characters, sorting by character set is like a chicken. Sort in simplified Chinese, which is generally sorted by pinyin. I have never really understood the validation in MySQL, but from the perspective of the program I have been involved in, such sorting is required to create a field for storing pinyin for sorting. However, there is a situation where there are multiple phonetic characters in pinyin. If it is UTF-8, there is still a certain range of Chinese at the same time shared by China, Japan and South Korea. Implementation is not so easy, so MySQL whether GBK or UTF-8 verification set should not be achieved pinyin. I dare say that the verification set used by most websites using MySQL in China is just a byte sorting. Byte sorting does not require any character sets. Therefore, for Chinese sites, MySQL character verification has no significance in sorting.

But in the like operation, it makes a little sense. For example, if I like '% a %', it is possible that a part of a Chinese character contains. Of course, this will not happen in UTF-8, because the storage format of UTF-8 makes a only a, not a part of a multi-byte character. However, this problem may occur in other character sets. In the end, like becomes the same as order, which makes the verification meaningless. Fainted.

2. If you do not need to sort data, like or full-text search, stop using char, varchar, text, and so on. Binary, varbinary, and BLOB are the correct options. Binary files are stored and extracted without converting the character set. In sorting, the files are only sorted by binary content. Therefore, the efficiency is much higher than char, varchar, and text.

In this case, character sets are not required. However, according to the current MySQL architecture, Character Set operations between client and connection ignore the field type, and Character Set conversion is still performed between the two nodes.

In addition, I would like to mention setting character sets in PHP. Please stop using statements like mysql_query ("set names utf8. Mysql_set_charset () is the most complete Character Set setting method. The latter has one more setting than the former, that is, setting the charset member of struct MySQL. This member variable plays a very important role in escape, especially for GBK running, "\" is used as the encoding format part of the character. If you only use mysql_query ("set names XXX"), there will be a major security vulnerability in some character sets, resulting in mysql_real_escape_string becoming as insecure as addslashes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.