Unicode data types

Source: Internet
Author: User
In SQL Server databases, data types are divided into two categories, Unicode data types and non-Unicode data types. In general, if the information stored in the database has multiple languages, I recommend that you use Unicode data types instead of non-Unicode data types.

First, the reasons for using Unicode data types.

The need to use Unicode data types in databases that are stored in multiple languages is primarily for the compatibility of some special characters. such as a domestic enterprise, in Mexico, the United States, Canada, Quebec and other countries have customers. Then in the system you need to record the addresses of these customers to facilitate the staff to send courier and so on. However, the address information often contains special characters in a language. So if you are using a non-Unicode data type, it is difficult to ensure that the database system is able to store these special characters correctly and make the correct conversion the next time you read the data. For this reason the party uses character columns and code pages, it must be careful that the database administrator ensures that the code pages installed with the database can handle special characters in multiple languages. Also, when the client reads characters from another language while running a code page in a language, make sure that the characters are converted correctly.

Many applications, such as today, support multiple languages. By choosing from the client side, you can support French, Russian, Japanese, Chinese, and so on. That is, a customer's address in a database may be stored in the same language as above in order to implement the system's multilanguage environment. And sometimes, the client may be Chinese, but the user needs to keep the French address in the database, at which point the database administrator needs to ensure that the French address (and possibly some special characters) can be saved correctly in the database, while another user needs to be able to read the French address correctly when using the English client environment. To implement this requirement, you need to select a Unicode data type when the database is installed.

Ii. characteristics of Unicode data types.

A Unicode data type is a standard that prints code bits to characters. This data type is designed to cover all the characters of almost all languages in the world, so there is no need for different code pages to handle different character sets when saving or reading data. For this reason, in a multi-language environment, you can also successfully save and read some special characters under various languages. Because the Unicode data type is used, it is saved with the original data, and the original data is read at the time of the reading. No additional conversions are required to save or read, and clients in different languages see the same characters in the database.

Unicode data types are supported in SQL Server databases. The database stores all text system catalog data in a column that contains a Unicode data type; the names of database objects such as table view stored procedures are also stored in Unicode columns. In this way, when developing an application, only Unicode data types are required to avoid character recognition problems encountered by code pages in all different languages during the conversion process.

Third, the use of Unicode data types on the performance impact.

Unicode data types and non-Unicode data types are different in their internal implementation mechanisms and have different effects on the performance of the database. In general, databases with non-Unicode data types are superior to database performance with Uncode data types. However, if the design of the database, indexes and other aspects of the better design, then this performance gap will be very small. What does that mean? If two databases, the Unicode data type and the non-Unicode data type are used respectively. If the indexes of these two databases are not very reasonable, the performance difference of the two databases will be quite large, and the database system with non-Unicode data type can be higher than the Unicode data type. However, if the index settings for the two databases are more reasonable, they use different data types, but the performance difference will be very small. This is one of the things that a database administrator must know to choose a data type when deploying a database system.

So why do they have these differences in performance? Mainly because of the differences in collation rules. Because the Unicode data type differs from the collation of a non-Unicode data type. By default, all Unicode characters are stored in a database with 2 bytes. For non-Unicode data types, some are in single-byte storage, and some are in double-byte storage. For example, all non-east Asian languages and Thai use single byte to store non-Unicode characters. In terms of sorting, single-byte characters are much more ordered than double-byte characters. So the same table, using different data types, their sorting performance will have a big difference. The more records you have in a table, the greater the performance difference. However, this difference can be minimized by reasonably setting the index.

The other two data types use different collation rules. In a SQL Server database, use a Unicode collation to perform a string comparison of non-Unicode data that is defined by Microsoft collation. Because these rules are much more complex than non-Unicode collations, they are more resource intensive. So although the cost of Unicode collations is usually greater, the performance difference between Unicode data types and non-Unicode data defined by Microsoft collations is generally not significant. The only case in which a database uses a non-Unicode collation is for non-Unicode data that is defined with a database collation. In this case, sorting and scanning are usually faster than when Unicode collations are applied. Unicode collations apply to all Unicode data that is defined using Microsoft Collation or SQL collation. In general, this performance gap can be narrowed by optimizing the index.

Therefore, there is a significant difference in performance between databases that use Unicode data types and databases that take non-Unicode data types only if the design is unreasonable. These differences can be avoided by optimizing database performance. However, despite the full optimization of the database, this performance difference is still there, it is impossible to completely eliminate. It's just that the difference may be small and small to the extent that it can be overlooked.

When should the Unicode data type be used?

In general, you should determine whether a Unicode data type should be used based on the application system. If the foreground application software is designed with a multilingual application environment, it is best to choose a Unicode data type when the database is installed. Otherwise, even in the application system level to achieve multi-language support, but in the display, there may still be garbled and other abnormal phenomena. In addition, even if the application does not support multiple languages, when the user saves the data, if you need to use a word characters that holds other languages (such as where the user might be able to copy address information directly from the customer's web site), then you need to have the database adopt a Unicode data type. Otherwise, the characters in these other languages will not be saved properly or displayed normally.

It is also important to note that when Unicode data types are required, they are primarily based on the needs of the user, and that their adverse impact on performance is placed in a secondary position. Because if the user is having this aspect demand, this is the rigid demand. So far, only Unicode data types can be used to resolve character conversions and display problems in the database, and there is no second way to go. Conversely, the impact of Unicode data types on database performance can be avoided by optimizing the database. You can optimize the indexes in the database tables, optimize the SQL statements, optimize the character conversion methods on the client and server side, and so on. These optimizations allow Unicode data types to minimize the negative impact on database performance, and can be negligible.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.