Analysis of the difference between char and varchar in MYSQL database and suggestions for its use _mysql

Source: Internet
Author: User
Tags hash md5 md5 hash

In a database, character-type data is the most, and can account for more than 80% of the entire database. It is very important to improve the performance of database to deal with character data correctly. In character data, the most used is char and varchar two kinds. The front is fixed length, and the following is a variable length. Now we need to consider the circumstances in which char-character data is used, and under what circumstances the varchar character data is used.

The difference between varchar and char character type data

In the MySQL database, the most used character data types are varchar and char. Both of these data types are used to store character data, but they vary greatly from structure to data preservation. and its specific way of implementation, but also rely on the storage engine. I'm here to talk about the difference between the two data types, for example, the most common MyISAM storage engine. In subsequent recommendations, this is also the case for this type of storage.

The first thing to understand here is that both types of data, regardless of which storage is used, the system stores data in a different way. It is for this reason that we need to study the difference between the two. And then, in the right circumstances, take the appropriate approach. With this in mind, let's look at what follows.

VarChar are often used to hold variable-length strings. To put it simply, we just fixed a maximum value, and then the system allocates the appropriate storage space based on the amount of data actually stored. For this reason, it can take up less storage space than a fixed-length type in terms of char character data. In practice, however, the exception is set here for a particular reason. If the administrator can specify the row_format=fixed option as needed. Using this option to create a MyISAM table, the system will use fixed-length space for each row. This can result in a loss of storage space. Typically, varchar data types can save disk space, which is often considered to improve database performance. However, it is often a double-edged sword to note here. It can also produce some side effects while improving the performance. Because the length is variable, this can cause some extra work when the data is updated. If the length of the character is 10 digits (the maximum number of characters specified by varchar is 50 bits) before the change, the system assigns only 10 storage locations (assuming that the system itself is not considered to be the cost). After the change, the amount of data reached 20 bits. Because there is no limit exceeding the maximum 50 bits, the database is allowed to be stored. Only its original storage location has been unable to meet its storage needs. At this point the system needs to do additional operations. Depending on the storage engine, some will adopt a split mechanism, while others will adopt a paging mechanism.

The char data type differs from the varchar data type in that it uses a fixed-length storage method. Simply put, the system always allocates the largest storage space for it. When data is saved, the system allocates so much storage space even if it does not reach the maximum length. Obviously, this type of storage can cause a waste of disk space. The point I need to remind here is that when the number of characters is insufficient, the system does not use space to fill. Conversely, if you save a char value, the system automatically filters its spaces if there is a null value behind it. When data is compared, the system fills the end of the string with a space.

Obviously, the biggest difference between varchar and char two character data types is that the former is a variable length and the latter is a fixed length. When stored, the former allocates the final storage space based on the actual data stored. The latter, regardless of the actual storage data length, is based on the length of the specified char to allocate storage space. Does this mean that char's data type is inferior to varchar? Fact Otherwise, there is no need for char character types. Although the varchar data type can save storage space, improve the efficiency of data processing. But some of the negative effects of its variable length sometimes negate the advantages it brings. To do this, in some cases, you still need to use the char data type.

Ii. Project proposals

According to the above analysis, we know that the varchar data type is a double-edged sword, which brings about the performance improvement, but also may have some additional consumption. We need to balance when we evaluate whether we are using varchar data types or char data types. In the actual project, we will consider the following situation.

One is judged by the length of the character. As a field, like the name of a person, its longest length is also limited. If we assign it a length of 18 characters. At this point, although each person's name may be different in length, but even assigned a fixed-length character type, that is, 18 characters length, the last wasted space is not very large. If you use the nvarchar data type, if you need to rename later, and the original storage space is not enough to accommodate the new value, but will cause some extra work. In this case, the data type with char fixed length is considered to be better when balancing. In the actual project, if a field has a shorter character length, it is generally a fixed character length.

The second is to consider whether the length of the similar. If a field has a long length, its length is always approximate, such as between 90 and 100 characters in general, or even the same length. It is more appropriate to use the char character type at this point. A more typical application is the MD5 hash value. When you use the MD5 hash value to store the user's password, it is very useful to use the char character type. Because its length is the same. In addition, it is generally recommended to use char-type data, such as storing the user's ID number, and so on.

Also please consider a question, what is the difference between CHAR (1) and varchar (1)? Although both of these can only be used to hold individual characters, varchar consumes one storage location more than char. This is mainly because of the use of the varchar data type, which uses 1 bytes to store the length information. The overhead char character type for this administration is not.

Third, from the point of view of debris. When you use the char character type, the storage space is allocated one-time. The contents of this field are stored together. From this perspective alone, there is no fragmentation. A variable-length character data type has a variable length of storage. When the data length is inconsistent before and after the change, the problem of fragmentation inevitably occurs. Therefore, when using variable-length character data, the database administrator has to defragment the fragments from time to times. such as performing a database export import job, to eliminate fragmentation.

Four is that even the use of varchar data types can not be too generous. What do you mean by that? If the user needs to store an address information now. Depending on the estimate, just use 100 characters. However, some database administrators would argue that the varchar data type is assigned a length based on actual needs. You might as well give it a bigger one. For this reason they may allocate 200 characters of storage space for this field. Is this varchar (100) really the same as varchar (200)? The result is negative. Although they are used to store 90-character data, their storage space is the same. But the consumption of memory is different. For varchar data types, storage space on the hard disk allocates storage based on the actual character length, but not for memory. It uses a fixed sized block of memory to hold the value. To put it simply, use the length defined in the character type, that is, 200 character spaces. Obviously, this can have a large negative impact on the sort or temporary table (which needs to be implemented through memory). So if some fields involve file sorting or disk based temporary tables, assigning varchar data types can still be too generous. To evaluate the length of the actual need, select one of the longest fields to set the length of the character. If you want to consider redundancy, you can leave a character length of around 10%. Never assume that it allocates storage space based on the actual length, but arbitrarily allocates the length, or simply uses the maximum character length.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.