Basic concepts of SQL Server sorting rules

Source: Internet
Author: User

Basic concepts of SQL Server sorting rules

Preface

We met a special scenario in the forum yesterday. His company is working on overseas projects. It must be compatible with three types of texts in Chinese and English plus a local language.
Are there any character sets or sorting rules that are compatible with all texts? It is very painful for overseas projects. Oracle has AL32UTF8 and MySQL has UTF8. Is there SQL Server?

Basic Concepts

ASCII code

Shortly after the invention of the computer, the computer was used only in the United States. They have created ASCII codes to represent space, punctuation marks, numbers, upper and lower case letters, and controllers. Can fully express all English. However, only English is supported.

GBK Encoding

Later, with the popularity of computers, the Chinese people expanded and transformed the Chinese ASCII code to generate GB2312 encoding, which can represent more than 6000 common Chinese characters. There are too many Chinese characters, including traditional Chinese characters and various characters.

GBK encoding, which includes the encoding in GB2312 and also expands a lot. China is a multi-ethnic country and almost all ethnic groups have their own independent language systems. To express those characters, we continue to extend the GBK encoding to GB18030 encoding.

Every country uses its own language encoding like China, so there are various encodings. If you do not install the corresponding encoding, you will not be able to explain the content that the corresponding encoding wants to express.

Finally, an organization named ISO couldn't stand it anymore. Together, they created an encoding UNICODE, which is very large and can accommodate any text and symbols in the world. So as long as there is a UNICODE encoding system on the computer, no matter which type of text is global, you only need to save the file as UNICODE encoding and it can be properly interpreted by other computers.

UTF-8 AND UTF-16

UNICODE occurs in two standard UTF-8 and UTF-16 during network transmission, each transmitting 8 (2 bytes) and 16 (4 bytes), respectively ). So someone will have doubts, since the UTF-8 can save so many characters, symbols, why there are so many domestic use GBK encoding people? Because the size of UTF-8 encoding is relatively large, accounting for more computer space, if the use of the majority of people are Chinese, GBK encoding can also be used.

In general:

Unicode is a "Character Set 」

UTF-8 is the encoding rule 」

Where:

Character Set: assign a unique ID for each character (Student name: Code bit/Code Point)
Encoding Rules: Rules for converting "bitwise" to a byte sequence (encoding/decoding can be understood as the encryption/decryption process)

Sorting rules

For example, we often use Chinese_PRC_CI_AS. The previous part is Chinese_PRC, which indicates the supported Chinese character sets. However, here is a special note that focuses on the following:

UNICODE sorting rules in simplified Chinese characters

There is a lot of ambiguity here. This sorting rule does not mean that all characters are Unicode. This is not accurate.

The second half of the sorting rule is the suffix meaning:

_ BIN binary sorting
_ CI (CS) is case sensitive, CI is case insensitive, and CS is case-insensitive/case-sensitive)
_ Whether AI (AS) is stress-sensitive, AI is not differentiated, and AS is (accent-insensitive/accent-sensitive)
_ KI (KS): whether Kana is distinguished. KKI is not distinguished. KS is distinguished (kanatype-insensitive/kanatype-sensitive)
_ WI (WS): whether to differentiate width without WI; WS (width-insensitive/width-sensitive)

Case Sensitive: select this option if you want to make the comparison between uppercase and lowercase letters different.

Accent differentiation: select this option if you want to treat the comparison as different from the accent and non-accent letters. If this option is selected,

Comparison also treats letters with different accents as unequal.

Kana differentiation: select this option if you want to treat Katakana and katakana as different Japanese syllables.

Width differentiation: select this option if you want to make the comparison between halfwidth and fullwidth characters.

Unicode

SQL SERVER supports Unicode. The corresponding character types are nchar and nvarchar.

Summary

So SQL SERVER does not have such sorting rules similar to the ORACLE UTF-8 .. If there are three types of text, we recommend that you define all the character types as nchar and nvarchar.

The above is all about the basic concepts of SQL Server sorting rules in this article, and I hope to help you. If you are interested, you can continue to refer to this site: Talking about the uncertainty of float under sqlserver, cube in SQLserver: Details of multi-dimensional dataset instances, sqlserver: querying and locking SQL statements, and unlocking methods, if you have any questions, you can leave a message at any time. The editor will reply to you in a timely manner. I hope you can provide more support for this site!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.