Collation of character types

Source: Internet
Author: User

In SQL Server, the character type primarily refers to single-byte characters (Char,varchar) and Unicode characters (Nchar,nvarchar). The sort and comparison of character types is determined by collation. For single-byte characters, collation determines the rendering of the character's code page and character data.

Collations that is used with character data types such as char and varchar dictate the code page and Co Rresponding characters that can is represented for that data type.

One, the three-off attribute associated with character data: Locale,code Page and Sort Order.

1,locale , associated region or language culture, different languages have different character sets.

A locale is a set of information which is associated with a location or a culture. This can include the name and identifier of the spoken language, the script, which is used to write the language, and Cultur Al conventions. Collations can associated with one or more locales.

2,Code Page, associated to the OS supported character datasets

A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated With each character. A Windows code page is typically referred to as a character set or charset. Code pages is used to provide support for the character sets and keyboard layouts that is used by different Windows Syst EM locales.

The code page is determined by the OS settings by setting the Windows OS Zone to set the code page used by the OS.

The code pages that a client uses is determined by the operating system settings. To set client code pages on the Windows operating system, use regional Settings in Control Panel.

3, sortorder, sorting of character data

Sort order Specifies how data values is sorted. This affects the results of data comparison. Data is sorted by using the collations, and it can be optimized by using indexes.

Two,Collation

in SQL Server, Collation consists of two parts: sorting Rule (also includes the code Page for Non-unicode characters) and the Comparison Style, which are used for rendering, sorting, and comparing characters. for Non-unicode characters, you must specify a code Page. To pass character data between different Code page, you must convert the source code page to the Destination code page. For Unicode characters, the code page is not required, and character datasets are passed between different machines without the need for code page conversions, which can improve the performance of data processing.

Comparison style is mainly case sensitivity, accent sensitivity, kana-sensitivity, width sensitivity.

A collation specifies the bit patterns that represent each character in a data set. Collations also determine the rules that sort and compare data. SQL Server supports storing objects that has different collations in a single database. For Non-unicode columns, the collation setting specifies the code page for the data and which characters can be represente D. Data that is moved between Non-unicode columns must being converted from the source code page to the destination code page .

Third, Unicode support

Non-unicode uses a byte to represent a character, because a byte can represent a character that is limited, different languages or regions, and uses the code page to differentiate between different character datasets. Each character set has a code Page. For computer using Non-unicode, only one code page can be set per machine. When character data is passed on different machines on the code page, the code page conversion is required. Unicode encoding uses two bytes (2Byte) to represent one character and can represent all the character datasets in the world, so the code Page is no longer required.

Unicode is a standard for mapping code points to characters. Because It is designed to cover all the characters of all the languages of the world, there are no need for different code Pages to handle different sets of characters. If you store character data this reflects multiple languages, always use Unicode data types (nchar, nvarchar< /c5>, and ntext) instead of the Non-unicode data types (char, varchar, and text).

Significant limitations is associated with Non-unicode data types. This is because a non-unicode computer'll be limited to use of a single code page. You might experience performance gain by using Unicode because fewer code-page conversions is required. Unicode collations must be selected individually on the database, column or expression level because they is not supporte D at the server level.

The code pages that a client uses is determined by the operating system settings. To set client code pages on the Windows operating system, use regional Settings in Control Panel.

Four, view SQL Server supported collation and its associated code Page

Select * ' codepage '  as CodePage  from Sys.fn_helpcollations ()

A common three code Page:

936: Simplified Chinese

1252:latin1, ASCII-compatible, single-byte encoding

65001:utf-8 Unicode

What is the difference between quoting Unicode and UTF-8? ":

only 101010 of binary data can be stored in the computer, so how are the characters displayed on the page displayed? One: Character set (Charset)charset = char + Set,char is a character, set is a collection, and CharSet is a collection of characters. The character set is the character that this encoding covers, and each character has a numeric ordinal. Two: Encoding Method (Encoding)encoding is how a character is encoded into a binary byte sequence, or in turn, how to parse it. It also gives you a numeric sequence number, which you want to encode into several bytes, byte order, or other special rules. Three: Glyph fonts (font)The font-stored glyph is called based on the number sequence, and can be displayed on the page. So a character to be displayed, to show what it looks like to look at the font file.
in summary, Unicode is just a character set, not encoded. UTF-8 is the encoding of a Unicode character set, as well as other utf-16,utf-32. with the character set and encoding, if the system font is not the character, it is not displayed.

Reference doc:

Collation and Unicode Support

COLLATE (Transact-SQL)

COLLATIONPROPERTY (Transact-SQL)

SYS.FN_HELPCOLLATIONS (Transact-SQL)

Collation of character types

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.