A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Comparison of data types
Char represents a fixed length, and the longest n-word varchar represents the actual length of the data type, such as: if it is a char type, when you enter a character that is less than the length, then a space is added, but a varchar type, it indicates the actual length of the character you entered (n is an integer, a different database, Maximum length n different)
Char:char Storage of fixed-length data is very convenient, the index on the Char field is more efficient, such as the definition of char (10), then regardless of whether you store the data reached 10 bytes, it takes up 10 bytes of space.
VARCHAR: Store variable-length data, but the storage efficiency is not high, if a field possible value is not fixed length, we only know it can not exceed 10 characters, it is defined as VARCHAR (10) is the most advantageous. the actual length of the varchar type is +1 of the actual length of its value. Why "+1"? This byte is used to hold the length that is actually used. From the space consideration, with the varchar suitable, from the efficiency consideration, uses the char to be suitable, the key is to find the tradeoff point according to the actual situation.
char and varchar differences:varchar saves space than char, but is slightly less efficient than char.
It is said that varchar is more space-saving than char because varchar is a mutable string, such as: Storing the string "ABC" with VARCHAR (5), consuming only 3 bytes of storage, and storing it with char (5). It occupies 5 bytes ("ABC "). The fact that varchar is less efficient than char is because, when modifying the varchar data, it is possible that data is migrated (that is, redundant I/O) because of the different length of the data. Where Oracle describes this redundant i/O is the expression "row migration".
Line migration (Row migration):
"When a row of records is initially inserted in a block, the row migration occurs because the update operation causes the rows to increase and the block's free space is completely full." In this case, Oracle will migrate the entire row of data to a new block, and Oracle will keep the original pointer of the migrated row pointing to the new block that holds the row data, which means that the row ID of the migrated rows will not change. "
One explanation: block is the smallest data organization and management unit in Oracle, the data file disk storage unit, and the database I/O minimum unit ( that is, read and write are a block size, so if the block is not full, A varchar field that updates the content length change, and a varchar field with unchanged content length, I/o times are the same, there is no additional consumption, only additional I/O occurs when the block is full, so the performance difference between char and varchar performance is quite subtle, Most cases are negligible, so the "slightly" difference described above)nvarchar (note that MySQL because of the use of utf-8, so regardless of Chinese characters or letters, is a length of char, so there is no need to divide nvarhcar and varchar, all as varchar) the characteristics of nvarchar need to be compared with varchar.
the difference between nvarchar and varchar is primarily in the way data is stored:1).varchar: Store data by bytevarchar(6), can store up to 6 bytes of data, for example: "hahaha", "abcdef" ... Note: A Chinese character in the database of how many bytes, to see the Unicode encoding, such as: UTF8 on MySQL accounted for 3 bytes, SQL Server chinese_prc_ci_as accounted for 2 bytes ...2).nvarchar: Store data by characternvarchar(6), can store up to 6 characters/Chinese data, such as: "Haha haha", "abcdef" ...nvarchar(m) The actual byte length of the maximum storage=N*M (n depends on encoding), if the nvarchar is stored in English characters, it is encoded to store the length of n bytes. In other words, if you use nvarchar to store English characters, you will waste more than half of your storage space ....
The difference between varchar and char character type data
In the MySQL database, the most character data types used are varchar and char.
Both of these data types are used to hold character data, but they vary greatly both from the structure and from the way the data is stored . and its specific implementation is also dependent on the storage engine .
Here is an example of the most commonly used MyISAM storage engine to discuss the differences between the two data types. In subsequent recommendations, this is also the case for this type of storage.
The first thing to understand here is that both data types, regardless of which storage is used, are different in how the system stores the data. Because of this, it is necessary to study the difference between the two. Then, in the right circumstances, use the right way. After you understand this, take a look at what's next.
VarChar is often used to hold variable-length strings.
Simply put, we just fixed a maximum value for it, and then the system allocates the appropriate storage space based on the amount of data actually stored. This consumes less storage space than a fixed-length type, compared to char character data. However, in the actual work, due to a particular cause of the system, the exception will be set here. If the administrator can specify the row_format=fixed option as needed. With this option to create a MyISAM table, the system will use a fixed-length space for each row. This can result in loss of storage space. Typically, thevarchar data type can save disk space , which is often considered to improve the performance of the database.
However, it is important to note that this is often a double-edged sword. It also tends to produce some side effects while improving performance. Because its length is variable, this can cause some extra work when the data is updated. If the character length is 10 bits before the change (the maximum number of characters specified by varchar is assumed to be 50 bits), then the system assigns only 10 storage locations (assuming the overhead of the system itself is not considered). After the change, the amount of data reached 20 bits. Because the maximum 50-bit limit is not exceeded, it is still allowed to be stored for this database. Only its original storage location has failed to meet its storage requirements. At this point the system will need to do extra work. Depending on the storage engine, some will use a split mechanism, while others will use a paging mechanism.
The char data type differs from the varchar data type in that it uses a fixed-length storage method.
Simply put, the system always allocates the largest storage space for it. When the data is saved, the system allocates so much storage space even if it does not reach its maximum length. Obviously, this type of storage can cause wasted disk space. One thing I need to remind you here is that when the number of characters is insufficient, the system does not fill with spaces. Conversely, if you save a char value, the system automatically filters its spaces if there is a null value behind it. When you compare data, the system fills the space at the end of the string.
Obviously, the biggest difference between varchar and char two character data types is that the former is variable length, while the latter is fixed length. When stored, the former allocates the final storage space based on the data that is actually stored. The latter, regardless of the length of the actual data stored, allocates storage space according to the length specified by char. Does this mean that the data type of char is inferior to varchar? Otherwise, there is no need to have a char character type. Although the varchar data type can save storage space and improve the efficiency of data processing. But some of the negative effects of variable lengths can sometimes offset the advantages they bring. For this, in some cases, you still need to use the char data type.
Second, the project proposal
According to the above analysis, we know that the varchar data type is a double-edged sword, and it may have some additional consumption while improving the performance. We need to be balanced when we evaluate whether to use the varchar data type or the char data type. In the actual project, we will consider the following situation.
one is judged by the length of the character .
The longest length of a field, like a person's name, is limited. If we give it a 18-character length. At this point, although each person's name may be different, but even if it is assigned a fixed-length character type, which is 18 characters long, the last wasted space is not very large. In the case of the nvarchar data type, if you need to rename later, and the original storage space is not enough to accommodate the new value, it will cause some extra work. In this case, when balancing, the data type with char fixed length is considered to be better. In a real project, if a field has a shorter character length, it is usually a fixed character length.
The second is to consider whether its length is similar .
If a field is longer in length, its length is always approximate, such as between 90 and 100 characters, or even the same length. It is more appropriate to use Char character types at this time. A more typical application is the MD5 hash value. When using the MD5 hash value to store user passwords, it is very useful to use char character types. Because its length is the same . In addition, it is generally recommended to use char type data, such as for storing the user's ID number and so on.
Please also consider a question,CHAR (1) and varchar (1) The definition of two, what is the difference? Although both can only be used to hold a single character, varchar consumes a storage location more than Char . This is mainly because when you use the varchar data type, you use more than 1 bytes to store the length information . This administrative overhead of the char character type is not.
third, consider from the point of view of fragmentation .
When you use the char character type, the storage space is allocated once. For this purpose, the contents of a field are stored together. From this point of view, there is no fragmentation problem. The variable-length character data type has a variable length of storage. When the data length is inconsistent before and after the change, the problem of fragmentation inevitably occurs. Therefore, when using variable-length character data, the database administrator periodically organizes the fragments. such as performing a database export import job, to eliminate fragmentation.
Four is not too generous even with the varchar data type .
What does that mean? If the user needs to store an address information now. According to the evaluation, only 100 characters are used. However, some database administrators would argue that the varchar data types are allocated lengths based on actual needs. It's better to give it a bigger one. For this reason they may allocate a 200-character storage space for this field at once. Is this varchar (100) really the same as varchar (200)? The result is negative. Although they are used to store 90 characters of data, their storage space is the same. However, the consumption of memory is different. For a varchar data type, the storage space on the hard disk allocates storage space based on the actual character length, but not for memory. It uses a fixed-size block of memory to hold the value. Simply put, it is the length defined in the character type, which is 200 character space. Obviously, this can have a significant adverse effect on the sort or temporary tables (which all need to be implemented through memory). So if some fields involve file sorting or disk-based temporal tables, it is still not too generous to allocate the varchar data type. Or, evaluate the length of the actual need, and then select a longest field to set the character length. If you want to consider redundancy, you can leave a character length of about 10%. You cannot think of it as allocating storage space based on actual length, arbitrarily allocating length, or simply using the maximum character length.
From the database's actual point of view
The character field is the most common field in a database table, and the character field is divided into two kinds: fixed length and variable length. In general, the varchar type is used to store data with a large change in content length, and the char type is used to store data with little or no change in content length.
On the internal storage of the data, the general varchar field uses 1 or 2 bytes as the length description of the data, with no meaningless spaces after the data, and the Char field fills the end of the data with a space until the defined length is filled.
In the operational efficiency of the data, ittakes more time to update the variable-length records with variable-length fields, and there is no significant difference between the INSERT, delete, and select operations.
On the index and match of the data, the varchar field will think ' Zhang San ' and ' Zhang San ' as two different data, while Char thinks they are equivalent.
When a char field is associated with a varchar field, most databases use a variable-length field as a baseline and do not crop trailing spaces in a char field. Therefore, ' Zhang San ' in the ' Zhang San ' and Char (12) in VARCHAR (12) are not equal and cannot be directly associated.
Typically, the processing period for a varchar field is:
Get data, trim data, store data, read data, display data
The processing period for char fields is:
Get data, store data, read data, trim data, display data
From the comparison we can see that the difference between the two is the location of the trim data processing. In fact, the difference is that trim is required for a varchar field, and trim is optional for char fields as needed. That is, in general, the process of varchar fields is more complex than the processing of char fields, so the efficiency is not as high as that of the char type field.
On the other hand, when determining the value of a field, the varchar field first compares the length of the two field values and compares the contents of the fields only in the same length, while the Char field directly compares the contents. So from the perspective of equality of field values, in most cases, the varchar field is more efficient than the char type field.
So how do you choose the char and varchar types of a field when designing a database table?
For a field with a small change in length, you should use a char type, and for a field with a large change in length, you might consider using varchar. There are also fields with a large variation in length, and because their records are often modified to improve efficiency, this type of field should also use the Char field. In other words, the char type should be preferred.
For 9-bit organization-coded fields, because all encodings are 9-bit filled, the char type should be used, and if the varchar type is used, it will not only waste storage space, but also reduce the efficiency of all processing.
For the ID number, there are 15-bit and 18-bit differences, you can choose to use the varchar field at this time, but considering that the changes between 15 and 18 are not significant, it is possible to use the Char field. In particular, the 15-digit ID number is a legacy problem, the subsequent data will be 18-bit, so the use of char-type field for the future efficiency advantage is obvious.
For a home address, it is very appropriate to store a varchar field because of the large gap in length and the record of its location that is not frequently modified.
Typically, we should use char if one of the following conditions is true:
1 The length of the field value is constant or does not change much; 2. The location of the records will be changed frequently.
With these principles in mind, we can determine the type of character data in the design of a database table by taking into account various factors. According to the "Business key" design pattern recommended in the previous article, the field that is usually the primary key is a field with no change in length or small change, such as account number, card numbers, institution number, social security number, etc., and for other non-indexed fields, char or varchar can be selected according to its content and other factors.
When our data table is associated with the data table of the external system, if the other party does not design the database table according to our rules, resulting in the use of char on one side of the associated field and varchar on the other, then carefully consider how the data will be converted.
If both sides have no index on the associated field, it would be better to convert char to varchar, or if one side has an index and the other does not, then the associated data type should be converted to the indexed one; if both are indexed and are used in the operation, Then convert a relatively small number of data records to a relatively large one. Char to varchar uses Trim,varchar to char using cast.
Now, the abuse of varchar is another very serious phenomenon in database design. In the database design, do not investigate and analyze the data, and even directly assume that all character fields are set to varchar (30), which is very detrimental to the operation and maintenance of the application system. Not the precise description of the characteristics of the field, it is not easy to accurately understand the business model, as the application system functions continue to expand and upgrade, the entire system model will become more and more blurred, until it can not be maintained, finally had to overturn, the cost is very large.
The difference between Sql-char and Varchar,nvarchar
Start building with 50+ products and up to 12 months usage for Elastic Compute Service