But I this feature is the principle of investigation, I care about things want to understand, so the QQ group in turn send information, no one heeded. Alas, depressed. Had to own Google it and teach myself. The following is a detailed description.
There is no one to ask for help, I have some personal thoughts. Nowadays people have very few to delve into theory, people's idea is to muddle along, people usually just know what, do not know why. For programming, individuals think this is a sad thin
. Net UCS2 plus codeThe simplest method. Recently, I developed a text message Gateway application. Although it is not as troublesome as PDUS, it is necessary to add a code for sending a Chinese text message (BTW does not need to be used in the end ).
The detailed name of the programming document should be UCS2 with codes, OK, UTF8 and 16. No stranger to everyone, but what is UCS2? Here, I will give a rough explanation.
The UCS has two formats:
Coding knowledge study Note 3I. How to code UTF-8
The UTF-8 is coded in 8 bits. The encoding from UCS-2 to UTF-8 is as follows:
Serial number
UCS-2 coding range (hexadecimal)
UTF-8 byte stream (Binary)
Description
1
0000-007f
0 xxxxxxx
1 byte in the format0 xxxxxxx
2
0080-07ff
110 XXXXX 10 xxxxxx
Two bytes in the format110 XXXXX10 xxxxxx
3
08
contains all the character sets known to humans, it can theoretically parse all the text.
Unicode
Unicode Character Set is actually an International StandardISO 10646. The Unicode Character Set is published by the Unicode Association.
ISO 10646DefinedUniversal Character Set). UCOS is a superset standard for all other character sets. ISO 10646 defines a 31-bit character set. however, in this huge encoding space, only the first 65534 code bits (0x0000 to 0 xfffd) are allocated so far ). the 16-
UTF encoding
The UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16 binary)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 is between 0800-ffff, so I'm sure to use a 3-byte te
letters, and is still represented by 1 bytes, while for example Chinese it is represented in 2 bytes. English and Chinese can be processed uniformly, and the method of distinguishing whether to encode in Chinese is 2 bytes in the first place of the high byte is 1, You must check the byte that follows it, and 2 bytes are interpreted as 1 characters. GB2312,GBK to GB18030 all belong to DBCS. In addition, ANSI encoding in Simplified Chinese windows is usually referred to as GBK (code page 936).The
Standardization) and uicode Association (an association of software manufacturers) started their work respectively. That is, the ISO 10646 project of ISO and the Unicode project of Unicode Association. Later, they began to merge the work results of both parties, using the same font and word code. However, both projects have their own standards.
UCs (Unicode Character Set ):This is the name of uicode in ISO, with two sets of encoding methods in mind.
the node cannot be selected if the node to which the string fail points is not selectableAnd then you can preprocess those nodes that can't be selected.Then use F[I][J] to indicate the number of scenarios where the AC auto Becky goes to the J node when the text string goes to IFirst, the outer enum I, then the J, if J is optional, enumerate 26 characters K,f[i][j] give J to the State (including the fail) v contribution along the K downI can understand how to write this question, but I can't see
of digits: 2 bytes, representing 21,886 characters.Range: High byte from 81 to Fe, low byte from 40 to FE.GB18030 Character SetFunction: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.
bytes, representing 21,886 characters.Range: High byte from 81 to Fe, low byte from 40 to FE.GB18030CharacterFunction: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.UCSCharacterRole: The Internat
appear garbled? It is because the sender and the recipient are using different encoding methods.It can be imagined that if there is an encoding, all the symbols in the world are included. Each symbol is given a unique character code, then the garbled problem disappears. This is Unicode, as its name indicates, which is an encoding of all symbols.Unicode is also a character encoding method. The scientific name for Unicode is "Universal multiple-octet Coded Character Set", referred to as
Again, it is necessary to emphasize that both the historical UCS and today's Unicode, both refer to the coded character set, not the character set encoding. Take a little time to understand this, and then you will find that all the pages, the system, the coding standards of the back and forth between the conversion and so on complex affairs will be clear, extremely easy.
First, the most common sense of the character set.
An abstract character set is
The characters defined in BMP can be encoded by 16 characters, that is, a UTF-16 of only one word (word, 2 bytes.Plane 0 (0000-ffff ):Basic multilingual plane(BMP)Therefore, Windows API, wchar/w_char (w_char can be 4 bytes from the Language Perspective), and char in Java/C # only supports BMP.
Although the UTF-16 is variable-length encoding, it is not like the UTF-8, it can be 1, 2, 3, 4 bytes, it can only be 2 or 4 bytes.
8. How many characters can Unicode contain? Is it dubyte?U
at the number of encoding formats supported by iconv. It seems that there are many formats:
Apple @ kissAir: ruby_src $ iconv-l
ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV: 1991 US US-ASCII CSASCII
UTF-8
UTF-8-MAC UTF8-MAC
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
z-coordinate)
Specifies the z-coordinate of the current viewport's center point, read-only
Height (altitude)
Specifies the height of the current viewport, read-only
Width (breadth)
Specifies the width of the current viewport, read-only
Misc (Other)
Table 8-7 Description of attribute entries
Entry
Description
UCS icon on (open
There is very little information about dmstool on the Internet. I also know some usage through usage. The following describes the usage through the help tool:
Description:
This script is a client side tool. Users can use it for collecting DMS metrics from multiple DMS instruments servers.
Usage:
Usage: dmstool -helpdmstool -list [-table] [-delimitor
What we usually use is to print dump information. Of course, there is also a closely related dmstool-list, that is, to obtain the value of a value
Computer 2000 Yuan about how to match the machine. in front of the small series in everyone has brought a lot of game configuration, and recently a lot of small partners asked about 2000 yuan how to match the machine, the main use is home study, do not play games. In fact, for such a computer configuration scheme is very good choice. Assemble the computer three piece ok basically OK, the processor chooses six generation, 10 system motherboard, do not show alone, 8GB big memory enough. See Small
What is UTF-8?
First, only an integer is allocated to the character encoding table. there are several methods to represent a string of characters as a string of bytes. the two most obvious methods are to store Unicode text as strings of 2 or 4 byte sequences. the formal names of the two methods are UCS-2 and UCS-4, respectively. unless otherwise specified, most of the bytes are like this (bigendian Conventi
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.