The difference between char and varchar

Source: Internet
Author: User
Tags hash md5 md5 hash


Today specifically searched the database of the string definition of the difference, write good, special excerpt as follows:






In the database, the character type of data is the most, can account for more than 80% of the entire database. It is very important to improve the performance of the database to deal with the data of character type correctly. In the character type data, the most used is char and varchar two types. The front is the fixed length, and the back is the variable length. What we need to consider now is, under what circumstances, when using char character data, what is the case with varchar character type data? In this part of the content, I would like to discuss this topic with you.






The difference between varchar and char character type data



In the MySQL database, the most character data types used are varchar and char. Both of these data types are used to hold character data, but they vary greatly both from the structure and from the way the data is stored. and its specific implementation is also dependent on the storage engine. I'm going to talk about the differences between the two data types as an example of the most commonly used MyISAM storage engine. In subsequent recommendations, this is also the case for this type of storage.



The first thing to understand here is that both data types, regardless of which storage is used, are different in how the system stores the data. It is precisely because of this that we need to study the difference between the two. Then, in the right circumstances, use the right way. With that in mind, let's look at the following.



VarChar is often used to hold variable-length strings. Simply put, we just fixed a maximum value for it, and then the system allocates the appropriate storage space based on the amount of data actually stored. This consumes less storage space than a fixed-length type, compared to char character data. However, in the actual work, due to a particular cause of the system, the exception will be set here. If the administrator can specify the row_format=fixed option as needed. With this option to create a MyISAM table, the system will use a fixed-length space for each row. This can result in loss of storage space. Typically, the varchar data type can save disk space, which is often considered to improve the performance of the database. However, it is important to note that this is often a double-edged sword. It also tends to produce some side effects while improving performance. Because its length is variable, this can cause some extra work when the data is updated. If the character length is 10 bits before the change (the maximum number of characters specified by varchar is assumed to be 50 bits), then the system assigns only 10 storage locations (assuming the overhead of the system itself is not considered). After the change, the amount of data reached 20 bits. Because the maximum 50-bit limit is not exceeded, it is still allowed to be stored for this database. Only its original storage location has failed to meet its storage requirements. At this point the system will need to do extra work. Depending on the storage engine, some will use a split mechanism, while others will use a paging mechanism.



The char data type differs from the varchar data type in that it uses a fixed-length storage method. Simply put, the system always allocates the largest storage space for it. When the data is saved, the system allocates so much storage space even if it does not reach its maximum length. Obviously, this type of storage can cause wasted disk space. One thing I need to remind you here is that when the number of characters is insufficient, the system does not fill with spaces. Conversely, if you save a char value, the system automatically filters its spaces if there is a null value behind it. When you compare data, the system fills the space at the end of the string.



Obviously, the biggest difference between varchar and char two character data types is that the former is variable length, while the latter is fixed length. When stored, the former allocates the final storage space based on the data that is actually stored. The latter, regardless of the length of the actual data stored, allocates storage space according to the length specified by char. Does this mean that the data type of char is inferior to varchar? Otherwise, there is no need to have a char character type. Although the varchar data type can save storage space and improve the efficiency of data processing. But some of the negative effects of variable lengths can sometimes offset the advantages they bring. For this, in some cases, you still need to use the char data type.



Second, the project proposal



According to the above analysis, we know that the varchar data type is a double-edged sword, and it may have some additional consumption while improving the performance. We need to be balanced when we evaluate whether to use the varchar data type or the char data type. In the actual project, we will consider the following situation.



One is judged by the length of the character. The longest length of a field, like a person's name, is limited. If we give it a 18-character length. At this point, although each person's name may be different, but even if it is assigned a fixed-length character type, which is 18 characters long, the last wasted space is not very large. In the case of the nvarchar data type, if you need to rename later, and the original storage space is not enough to accommodate the new value, it will cause some extra work. In this case, when balancing, the data type with char fixed length is considered to be better. In a real project, if a field has a shorter character length, it is usually a fixed character length.



The second is to consider whether its length is similar. If a field is longer in length, its length is always approximate, such as between 90 and 100 characters, or even the same length. It is more appropriate to use Char character types at this time. A more typical application is the MD5 hash value. When using the MD5 hash value to store user passwords, it is very useful to use char character types. Because its length is the same. In addition, it is generally recommended to use char type data, such as for storing the user's ID number and so on.



Please also consider a question, CHAR (1) and varchar (1) The definition of two, what is the difference? Although both can only be used to hold a single character, varchar consumes a storage location more than char. This is mainly because when you use the varchar data type, you use more than 1 bytes to store the length information. This administrative overhead of the char character type is not.



Third, consider from the point of view of fragmentation. When you use the char character type, the storage space is allocated once. For this purpose, the contents of a field are stored together. From this point of view, there is no fragmentation problem. The variable-length character data type has a variable length of storage. When the data length is inconsistent before and after the change, the problem of fragmentation inevitably occurs. Therefore, when using variable-length character data, the database administrator periodically organizes the fragments. such as performing a database export import job, to eliminate fragmentation.



Four is not too generous even with the varchar data type. What does that mean? If the user needs to store an address information now. According to the evaluation, only 100 characters are used. However, some database administrators would argue that the varchar data types are allocated lengths based on actual needs. It's better to give it a bigger one. For this reason they may allocate a 200-character storage space for this field at once. Is this varchar (100) really the same as varchar (200)? The result is negative. Although they are used to store 90 characters of data, their storage space is the same. However, the consumption of memory is different. For a varchar data type, the storage space on the hard disk allocates storage space based on the actual character length, but not for memory. It uses a fixed-size block of memory to hold the value. Simply put, it is the length defined in the character type, which is 200 character space. Obviously, this can have a significant adverse effect on the sort or temporary tables (which all need to be implemented through memory). So if some fields involve file sorting or disk-based temporal tables, it is still not too generous to allocate the varchar data type. Or, evaluate the length of the actual need, and then select a longest field to set the character length. If you want to consider redundancy, you can leave a character length of about 10%. You cannot think of it as allocating storage space based on actual length, arbitrarily allocating length, or simply using the maximum character length.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.