Linux system Introduction Vim encoding and font details
Like all popular text editors, VIM can well edit a variety of character encoding files, which of course include popular unicode encoding methods such as UCS-2 and UTF-8. Unfortunately, like a lot of software from the Linux world, you need to set it yourself.Vim has four options related to the character encoding method: encoding, fileencoding, fileenc
multilingual plane, The code segment from D800 to DFFF is permanently reserved and is not mapped to characters, so the UTF-16 encoding cleverly leverages this retained code point to encode characters in the auxiliary plane, which is explained later. Unicode is just a set of symbols, only the code points corresponding to the specified characters, and does not specify how to store, how to store there is a different encoding scheme, about the Unicode encoding scheme has two main main lines:
high byte from 81 to FE, low byte from 40 to 7E and 80 to FE; 4 byte first three byte from 81 to FE, the second and fourth bytes are from 30 to 39.
UNICODE
Unicode name "Universal Multiple-Octet Coded Character Set", short for UCS, is a Set of Character encodings maintained by international organizations. The UCOS assigns a unique code point (code point) to each character, which is usually expressed as U + xxxx, where xxxx is the corresponding hexade
This statement can be reproduced at will, but the original author charlee and original link http://tech.idv2.com/2008/02/21/unicode-intro/must be indicated during reprinting.
Basic knowledge
Differences between byte and character
Big endian and little endian
UCS-2 and UCS-4
UTF-16 and UTF-32
UTF-16
UTF-32
UTF-8
Basic knowledge
Before int
a character, it is an auxiliary plane character, the code point is u+1d306, and the calculation process of converting it to UTF-16 is as follows.
H= Math.Floor (0x1d306-0x10000) /0x400) +0xd800 = 0xd834l = (0x1d306-0x10000) % 0x400+ 0XDC00 = 0xdf06
Therefore, the UTF-16 encoding of the character is 0xd834 DF06, which is four bytes in length.V. What encoding does JavaScript use?The JavaScript language uses the Unicode character set, but only one encoding method is supported.This co
point is u+1d306, and the calculation process of converting it to UTF-16 is as follows.
H = Math.floor ((0x1d306-0x10000)/0x400) +0xd800 = 0xd834l = (0x1d306-0x10000)% 0X400+0XDC00 = 0xDF06
Therefore, the UTF-16 encoding of the character is 0xd834 DF06, which is four bytes in length.V. What encoding does JavaScript use?The JavaScript language uses the Unicode character set, but only one encoding method is supported.This code is neither UTF-16, nor UTF-8, nor UTF-32. The coding met
permanently reserved and is not mapped to characters, so the UTF-16 encoding cleverly leverages this retained code point to encode characters in the auxiliary plane, which is explained later. Unicode is just a set of symbols, only the code points corresponding to the specified characters, and does not specify how to store, how to store there is a different encoding scheme, about the Unicode encoding scheme has two main main lines: UCS and UTF. UTF ma
I. Universal Character Set (UCS)
ISO/IEC 10646-1 [ISO-10646] defines a character set of more than 8 bits, called a universal Character set (UCS), which contains most of the world's written character systems. Two more than 8 bit-byte encodings have been defined, with four 8-bit bytes encoded for each character called UCS-4, with two 8-byte encodings for each char
flat character, the Code point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
The Code is as follows:
H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript uses the Unicode Character Set, but only supports one encoding method.
This encoding is neither a UTF-16 nor a UTF-8, nor a
GBK. gb18030 is based on GBK and adds major ethnic minority texts such as Tibetan, Mongolian, and Uyghur.CodePage is a ing table between text encoding and Unicode in different countries. For example, the ing table between GBK and Unicode is cp936, so cp936 is also commonly used to refer to GBK.
3. Unicode
ANSI has many code pages. internal codes of different code pages cannot be normally displayed on other code pages. Due to the inconvenience of communication and transmission caused by diffe
point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
Copy codeThe Code is as follows: H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript uses the Unicode Character Set, but only supports one encoding method.
This encoding is neither a UTF-16 nor a UTF-8, nor a UTF-32. The above e
= Math.floor ((0x1d306-0x10000)/0x400) +0xd800 = 0xd834l = (0x1d306-0x10000)% 0X400+0XDC00 = 0xdf06
So, the UTF-16 encoding of a character is 0xd834 DF06, which is four bytes long.
What kind of coding does JavaScript use?
The JavaScript language takes the Unicode character set, but only one encoding method is supported.
This encoding is neither UTF-16 nor UTF-8, nor is it UTF-32. The above coding methods, JavaScript are not.
JavaScript uses a u
, the code point is u+1d306, the process of converting it to UTF-16 is as follows.
Copy Code code as follows:
H = Math.floor ((0x1d306-0x10000)/0x400) +0xd800 = 0xd834l = (0x1d306-0x10000)% 0X400+0XDC00 = 0xdf06
So, the UTF-16 encoding of a character is 0xd834 DF06, which is four bytes long.
What kind of coding does JavaScript use?
The JavaScript language takes the Unicode character set, but only one encoding method is supported.
This encoding is neither UTF-16 nor UT
. floor (c-0x10000)/0x400) + 0xD800L = (c-0x10000) % 0x400 + 0xDC00
Take the character as an example, it is a secondary flat character, the Code point is U + 1D306, it is converted into the UTF-16 calculation process is as follows.
The Code is as follows:
H = Math. floor (0x1D306-0x10000)/0x400) + 0xD800 = 0xD834L = (0x1D306-0x10000) % 0x400 + 0xDC00 = 0xDF06
Therefore, the character UTF-16 encoding is 0xD834 DF06, length is four bytes.
5. What encoding does JavaScript use?
JavaScript use
medium version. UCS
As in Chinese, almost every language has a problem with designing a character set for its own language.
Recognizing this problem, ISO designed a set of Universal Character set UCS(Universal Character Set) to represent the world (even aliens) in a set of character sets. ) of all characters.
Results UCS success, because the internet has develop
print multiple Windows test pages. If not, we recommend that you replace the pcl5e driver. If PS driver is supported, you can replace the PS driver. We recommend that you disable anti-virus software, firewall, and so on.Program.
② If there is no problem with communication and driver after examination, and there is no problem with text printing, it is very likely to be caused by the special fonts or images in the file, you can try to replace the special font or image in the file, or further pr
. Net UCS2 plus codeThe simplest method. Recently, I developed a text message Gateway application. Although it is not as troublesome as PDUS, it is necessary to add a code for sending a Chinese text message (BTW does not need to be used in the end ).
The detailed name of the programming document should be UCS2 with codes, OK, UTF8 and 16. No stranger to everyone, but what is UCS2? Here, I will give a rough explanation.
The UCS has two formats:
Coding knowledge study Note 3I. How to code UTF-8
The UTF-8 is coded in 8 bits. The encoding from UCS-2 to UTF-8 is as follows:
Serial number
UCS-2 coding range (hexadecimal)
UTF-8 byte stream (Binary)
Description
1
0000-007f
0 xxxxxxx
1 byte in the format0 xxxxxxx
2
0080-07ff
110 XXXXX 10 xxxxxx
Two bytes in the format110 XXXXX10 xxxxxx
3
08
Tornadof recently at home online always found will be inexplicable disconnection, check found not wireless AP out of the problem, but the laptop comes with the wireless network card is disabled for no reason. Google on the Internet only to find that the network card driver is a problem: (The following is the description of the August 08 driver) Intel pro/wireless 2200bg/2915abg/3945abg/wifi Link 4965agn/5100 agn/ 5300 agn/5150/5350 Series Wireless lap
contains all the character sets known to humans, it can theoretically parse all the text.
Unicode
Unicode Character Set is actually an International StandardISO 10646. The Unicode Character Set is published by the Unicode Association.
ISO 10646DefinedUniversal Character Set). UCOS is a superset standard for all other character sets. ISO 10646 defines a 31-bit character set. however, in this huge encoding space, only the first 65534 code bits (0x0000 to 0 xfffd) are allocated so far ). the 16-
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.