Competition between CHAR and VARCHAR in MySQL database

Source: Internet
Author: User
Tags md5 hash

In a database, the majority type of data accounts for more than 80% of the total database. Therefore, correct processing of structured data can greatly improve database performance. In character data, Char and Varchar are the most common types. The front is a fixed length, while the back is a variable length. Now we need to consider under what circumstances the Char character type data is used and under what circumstances the Varchar character type data is used. In this part, I will discuss this topic with you.

 

I. Differences between VARCHAR and CHAR character data

In MySQL databases, the most frequently used data types are Varchar and Char .. Although both data types are used to store structured data, they differ greatly from the structure or data storage method. The specific implementation method also depends on the storage engine. Here I will take the MYISAM storage engine, which is most commonly used as an example, to talk about the differences between the two data types. It is also recommended for this type of storage.

The first thing to understand here is that the two data types, no matter which type of storage is used, are different in the way the system stores data. That is why we need to study the differences between the two. Then, use the appropriate method when appropriate. After learning about this, let's take a look at the subsequent content.

Varchar is often used to save variable-length strings. Simply put, we just fixed a maximum value for it, and then the system will allocate suitable storage space based on the actual amount of data stored. Therefore, compared with CHAR data, it can occupy less storage space than fixed-length data types. However, in actual work, exceptions will be set here for some special reasons. For example, the administrator can specify the ROW_FORMAT = FIXED option as needed. If you use this option to create a MyISAM table, the system uses a fixed length of space for each row. This will cause storage space loss. Generally, the VARCHAR data type can save disk space, so it is often considered to be able to improve database performance. However, it is often a double-edged sword. It also produces some side effects while improving performance. If its length is variable, some additional work may be performed when data is updated. For example, before the change, the character length is 10 characters (the maximum number of characters specified by Varchar is assumed to be 50 characters ), in this case, the system allocates only 10 storage locations (assuming that the system overhead is not taken into account ). After the change, the data volume reaches 20 bits. Because there is no more than 50-bit limit, the database still allows its storage. However, its original storage location cannot meet its storage needs. In this case, the system requires additional operations. If the storage engine is different, some adopt the split mechanism, while others adopt the paging mechanism.

CHAR data types are different from VARCHAR data types. They use a fixed-length storage method. Simply put, the system always allocates the largest storage space for it. When data is stored, the system allocates so much storage space even if it does not reach the maximum length. Obviously, this storage method will waste disk space. Here, I need to remind you that when the number of characters is insufficient, the system will not use spaces to fill the space. On the contrary, if a null value is followed by a CHAR value when it is saved, the system automatically filters its spaces. During data comparison, the system will fill the space at the end of the string.

Obviously, the biggest difference between VARCHAR and CHAR Types is that the former is a variable length, while the latter is a fixed length. During storage, the former allocates the final storage space based on the actual stored data. The latter allocates storage space based on the length specified by CHAR regardless of the actual length of the stored data. Does this mean that the CHAR data type is inferior to that of VARCHAR? Actually not. Otherwise, there is no need for the CHAR character type. Although VARCHAR data types can save storage space and improve data processing efficiency. However, some negative effects of its variable length may sometimes offset its advantages. In some cases, the Char data type is still required.

2. Project suggestions

According to the above analysis, we know that the VARCHAR data type is a double-edged sword, which may cause some additional consumption while improving the performance. We need to balance whether the VARCHAR data type is used or the CHAR data type is used. In actual projects, we will consider the following situations.

First, it is determined based on the length of characters. For example, a field, such as a person's name, has a limited length. For example, we can assign it a length of 18 characters. At this time, although the length of each person's name may be different, even if a fixed-length character type is assigned to it, that is, 18 characters in length, the last waste of space is not very large. If the NVARCHAR data type is used, you need to change the name later, but the original storage space is insufficient to accommodate new values, which will lead to some additional work. In this case, the data type with a fixed CHAR length is considered to be better. In actual projects, if the character length of a field is relatively short, a fixed character length is generally used.

Second, consider whether the length is similar. If the length of a field is long, its length is always approximate, for example, it is generally between 90 to 100 characters, or even the same length. In this case, the CHAR character type is more suitable. A typical application is the MD5 hash value. When the MD5 hash value is used to store the user password, the CHAR character type is very used. Because the length is the same. In addition, CHAR data is also recommended for storing user ID card numbers.

What is the difference between CHAR (1) and VARCHAR (1? Although both of them can only be used to save a single character, VARCHAR occupies one storage location more than CHAR. This is mainly because one byte is used to store length information when VARCHAR data type is used. The overhead CHAR character type in this management is not available.

The third is to consider the fragmentation. When the CHAR character type is used, the storage space is allocated at one time. For this reason, the content of a field is stored together. From this perspective, there is no fragmentation problem. The storage length of variable-length character data types is variable. When the data length is different before and after the change, the fragmentation problem will inevitably occur. Therefore, when using variable-length structured data, the database administrator must sort the fragments from time to time. For example, execute a Database Export and Import job to eliminate fragments.

Fourth, even if the Varchar data type is used, it cannot be too generous. What does this mean? For example, you need to store an address. According to the evaluation, you only need to use 100 characters. However, some database administrators think that the Varchar data type is allocated according to actual needs. It is better to give it a bigger one. Therefore, they may allocate 200 characters of storage space for this field at a time. Is VARCHAR (100) the same as VARCHAR (200? The result is no. Although they are used to store 90 characters of data, their storage space is the same. However, the memory consumption is different. For the VARCHAR data type, although the storage space on the hard disk is allocated according to the actual character length, it is not for the memory. At that time, a fixed memory block is used to save the value. Simply put, the length defined in the character type is used, that is, the space of 200 characters. Obviously, this will have a great negative impact on sorting or temporary tables (which must be implemented through memory) jobs. Therefore, if some fields involve File Sorting or disk-based temporary tables, the VARCHAR data type cannot be allocated too generously. Evaluate the actual length and select the longest field to set the character length. For redundancy, a length of about 10% characters can be left. Do not assign storage space according to the actual length, but assign the length at will, or simply use the maximum character length.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.