I. Universal Character Set (UCS)
ISO/IEC 10646-1 [ISO-10646] defines a character set of more than 8 bits, called a universal Character set (UCS), which contains most of the world's written character systems. Two more than 8 bit-byte encodings have been defined, with four 8-bit bytes encoded for each character called UCS-4, with two 8-byte encodings for each char
flat character, the Code point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
The Code is as follows:
H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript uses the Unicode Character Set, but only supports one encoding method.
This encoding is neither a UTF-16 nor a UTF-8, nor a
GBK. gb18030 is based on GBK and adds major ethnic minority texts such as Tibetan, Mongolian, and Uyghur.CodePage is a ing table between text encoding and Unicode in different countries. For example, the ing table between GBK and Unicode is cp936, so cp936 is also commonly used to refer to GBK.
3. Unicode
ANSI has many code pages. internal codes of different code pages cannot be normally displayed on other code pages. Due to the inconvenience of communication and transmission caused by diffe
point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
Copy codeThe Code is as follows: H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript uses the Unicode Character Set, but only supports one encoding method.
This encoding is neither a UTF-16 nor a UTF-8, nor a UTF-32. The above e
= Math.floor ((0x1d306-0x10000)/0x400) +0xd800 = 0xd834l = (0x1d306-0x10000)% 0X400+0XDC00 = 0xdf06
So, the UTF-16 encoding of a character is 0xd834 DF06, which is four bytes long.
What kind of coding does JavaScript use?
The JavaScript language takes the Unicode character set, but only one encoding method is supported.
This encoding is neither UTF-16 nor UTF-8, nor is it UTF-32. The above coding methods, JavaScript are not.
JavaScript uses a u
, the code point is u+1d306, the process of converting it to UTF-16 is as follows.
Copy Code code as follows:
H = Math.floor ((0x1d306-0x10000)/0x400) +0xd800 = 0xd834l = (0x1d306-0x10000)% 0X400+0XDC00 = 0xdf06
So, the UTF-16 encoding of a character is 0xd834 DF06, which is four bytes long.
What kind of coding does JavaScript use?
The JavaScript language takes the Unicode character set, but only one encoding method is supported.
This encoding is neither UTF-16 nor UT
of the controller, and how to use the C statement to operate registers and internal memory.For example, in the 51 assembly, we write mov A, #20 h. The assembler can recognize a as a accumulator, and in the 51 C program we write ACC = 32 ;, the compiler can recognize that ACC refers to accumulators rather than general variables. That is, each register has a proprietary name for developers to use. They are defined in a header file reg51.h, the programmer only needs to use the # include "reg51.h"
of string manipulation, which is a very important reason for Java to use UTF-16 as a character storage format for memory.UTF-8The UTF16 is fixed using 2 bytes (or 4 bytes) to represent characters, which makes it incompatible with earlier, heavily used ASCII code, while some special characters have special meanings in UNIX systems, such as '/0 ' or '/', which have special meanings in filenames and other C library function parameters. In addition, some of the most commonly used characters (Wester
(Hebrew)-Hebrew (visual order)
* ISO 8859-8-i-Hebrew (logical order)
* ISO 8859-9 (Latin-5 or Turkish)-it wraps Latin-1 Icelandic letters and joins the Turkish alphabet.
* ISO 8859-10 (Latin-6 or Nordic)-North Germanic branch, used to replace Latin-4.
* ISO 8859-11 (Thai)-Thai, evolved from the TIS620 standard Word set in Thailand.
* ISO 8859-13 (Latin-7 or Baltic Rim)-Baltic languages
* ISO 8859-14 (Latin-8 or Celtic)-Celtic languages
* ISO 8859-15 (Latin-9)-Western European languages,
But I this feature is the principle of investigation, I care about things want to understand, so the QQ group in turn send information, no one heeded. Alas, depressed. Had to own Google it and teach myself. The following is a detailed description.
There is no one to ask for help, I have some personal thoughts. Nowadays people have very few to delve into theory, people's idea is to muddle along, people usually just know what, do not know why. For programming, individuals think this is a sad thin
. floor (c-0x10000)/0x400) + 0xD800L = (c-0x10000) % 0x400 + 0xDC00
Take the character as an example, it is a secondary flat character, the Code point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
The Code is as follows:
H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript use
medium version. UCS
As in Chinese, almost every language has a problem with designing a character set for its own language.
Recognizing this problem, ISO designed a set of Universal Character set UCS(Universal Character Set) to represent the world (even aliens) in a set of character sets. ) of all characters.
Results UCS success, because the internet has develop
. Net UCS2 plus codeThe simplest method. Recently, I developed a text message Gateway application. Although it is not as troublesome as PDUS, it is necessary to add a code for sending a Chinese text message (BTW does not need to be used in the end ).
The detailed name of the programming document should be UCS2 with codes, OK, UTF8 and 16. No stranger to everyone, but what is UCS2? Here, I will give a rough explanation.
The UCS has two formats:
Coding knowledge study Note 3I. How to code UTF-8
The UTF-8 is coded in 8 bits. The encoding from UCS-2 to UTF-8 is as follows:
Serial number
UCS-2 coding range (hexadecimal)
UTF-8 byte stream (Binary)
Description
1
0000-007f
0 xxxxxxx
1 byte in the format0 xxxxxxx
2
0080-07ff
110 XXXXX 10 xxxxxx
Two bytes in the format110 XXXXX10 xxxxxx
3
08
contains all the character sets known to humans, it can theoretically parse all the text.
Unicode
Unicode Character Set is actually an International StandardISO 10646. The Unicode Character Set is published by the Unicode Association.
ISO 10646DefinedUniversal Character Set). UCOS is a superset standard for all other character sets. ISO 10646 defines a 31-bit character set. however, in this huge encoding space, only the first 65534 code bits (0x0000 to 0 xfffd) are allocated so far ). the 16-
UTF encoding
The UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16 binary)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 is between 0800-ffff, so I'm sure to use a 3-byte te
letters, and is still represented by 1 bytes, while for example Chinese it is represented in 2 bytes. English and Chinese can be processed uniformly, and the method of distinguishing whether to encode in Chinese is 2 bytes in the first place of the high byte is 1, You must check the byte that follows it, and 2 bytes are interpreted as 1 characters. GB2312,GBK to GB18030 all belong to DBCS. In addition, ANSI encoding in Simplified Chinese windows is usually referred to as GBK (code page 936).The
Standardization) and uicode Association (an association of software manufacturers) started their work respectively. That is, the ISO 10646 project of ISO and the Unicode project of Unicode Association. Later, they began to merge the work results of both parties, using the same font and word code. However, both projects have their own standards.
UCs (Unicode Character Set ):This is the name of uicode in ISO, with two sets of encoding methods in mind.
bytes, representing 21,886 characters.Range: High byte from 81 to Fe, low byte from 40 to FE.GB18030CharacterFunction: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.UCSCharacterRole: The Internat
appear garbled? It is because the sender and the recipient are using different encoding methods.It can be imagined that if there is an encoding, all the symbols in the world are included. Each symbol is given a unique character code, then the garbled problem disappears. This is Unicode, as its name indicates, which is an encoding of all symbols.Unicode is also a character encoding method. The scientific name for Unicode is "Universal multiple-octet Coded Character Set", referred to as
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.