UTF encoding
The UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16 binary)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 is between 0800-ffff, so I'm sure to use a 3-byte te
letters, and is still represented by 1 bytes, while for example Chinese it is represented in 2 bytes. English and Chinese can be processed uniformly, and the method of distinguishing whether to encode in Chinese is 2 bytes in the first place of the high byte is 1, You must check the byte that follows it, and 2 bytes are interpreted as 1 characters. GB2312,GBK to GB18030 all belong to DBCS. In addition, ANSI encoding in Simplified Chinese windows is usually referred to as GBK (code page 936).The
Standardization) and uicode Association (an association of software manufacturers) started their work respectively. That is, the ISO 10646 project of ISO and the Unicode project of Unicode Association. Later, they began to merge the work results of both parties, using the same font and word code. However, both projects have their own standards.
UCs (Unicode Character Set ):This is the name of uicode in ISO, with two sets of encoding methods in mind.
of string manipulation, which is a very important reason for Java to use UTF-16 as a character storage format for memory.UTF-8The UTF16 is fixed using 2 bytes (or 4 bytes) to represent characters, which makes it incompatible with earlier, heavily used ASCII code, while some special characters have special meanings in UNIX systems, such as '/0 ' or '/', which have special meanings in filenames and other C library function parameters. In addition, some of the most commonly used characters (Wester
(Hebrew)-Hebrew (visual order)
* ISO 8859-8-i-Hebrew (logical order)
* ISO 8859-9 (Latin-5 or Turkish)-it wraps Latin-1 Icelandic letters and joins the Turkish alphabet.
* ISO 8859-10 (Latin-6 or Nordic)-North Germanic branch, used to replace Latin-4.
* ISO 8859-11 (Thai)-Thai, evolved from the TIS620 standard Word set in Thailand.
* ISO 8859-13 (Latin-7 or Baltic Rim)-Baltic languages
* ISO 8859-14 (Latin-8 or Celtic)-Celtic languages
* ISO 8859-15 (Latin-9)-Western European languages,
But I this feature is the principle of investigation, I care about things want to understand, so the QQ group in turn send information, no one heeded. Alas, depressed. Had to own Google it and teach myself. The following is a detailed description.
There is no one to ask for help, I have some personal thoughts. Nowadays people have very few to delve into theory, people's idea is to muddle along, people usually just know what, do not know why. For programming, individuals think this is a sad thin
of digits: 2 bytes, representing 21,886 characters.Range: High byte from 81 to Fe, low byte from 40 to FE.GB18030 Character SetFunction: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.
versions of Intel WiFi Link 1000 BGN drivers (v13.2.1.0) available for download on the Internet.Download the drivers of both versions.As a medical doctor, the dead horse unmounts the driver of the wireless network card in the system and restarts the system.First install the old version of the driver (v13.2.1.0), after the installation is complete, test for a day, the blue screen phenomenon does not appear, and then continue to observe.
Intel WiFi Link 1000 BGN v13.2.1.0 official driver:
Win
The characters defined in BMP can be encoded by 16 characters, that is, a UTF-16 of only one word (word, 2 bytes.Plane 0 (0000-ffff ):Basic multilingual plane(BMP)Therefore, Windows API, wchar/w_char (w_char can be 4 bytes from the Language Perspective), and char in Java/C # only supports BMP.
Although the UTF-16 is variable-length encoding, it is not like the UTF-8, it can be 1, 2, 3, 4 bytes, it can only be 2 or 4 bytes.
8. How many characters can Unicode contain? Is it dubyte?U
at the number of encoding formats supported by iconv. It seems that there are many formats:
Apple @ kissAir: ruby_src $ iconv-l
ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV: 1991 US US-ASCII CSASCII
UTF-8
UTF-8-MAC UTF8-MAC
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
z-coordinate)
Specifies the z-coordinate of the current viewport's center point, read-only
Height (altitude)
Specifies the height of the current viewport, read-only
Width (breadth)
Specifies the width of the current viewport, read-only
Misc (Other)
Table 8-7 Description of attribute entries
Entry
Description
UCS icon on (open
What is UTF-8?
First, only an integer is allocated to the character encoding table. there are several methods to represent a string of characters as a string of bytes. the two most obvious methods are to store Unicode text as strings of 2 or 4 byte sequences. the formal names of the two methods are UCS-2 and UCS-4, respectively. unless otherwise specified, most of the bytes are like this (bigendian Conventi
Server, IBM DB2, Oracle, Sybase ..., Production Platform Microsoft Windows CE, NT, 2000, XP... Java/Visual Studio ..., In addition, Unicode is the main method for implementing ISO/IEC 10646. The emergence of the Unicode mark and the existence of tools supporting it are the most important development trend in the near world.Unicode and zookeeper
Unicode does not specify how characters are displayed in the reader, resource, and webpage. The representation of each character must be processed throu
UTF-8 concepts. Address: http://www.utf.com.cn/article/s41-3
What is UTF-8?
First, only an integer is allocated to the character encoding table. there are several methods to represent a string of characters as a string of bytes. the two most obvious methods are to store Unicode text as strings of 2 or 4 byte sequences. the formal names of the two methods are UCS-2 and UCS-4, respectively. unless otherwise s
ISO/IEC 10646. Many operating systems, all the latest browsers and many other products support it. The emergence of Unicode standards and the existence of tools supporting it have become the most important development trend of software technology in the world recently.Unicode can be combined with client servers or multi-tier applications and websites to save costs than traditional character sets. Unicode enables a single software product or website to run across multiple platforms, languages, a
What is Unicode?
A mapping with characters and A is index, we use U+XXXX to represent it.
Confuse with Unicode and UTF-8? Unicode is a standard char set, UTF-8 are one of implementation, just one of UCS-2, UCS-4 and so forth, but it becomes Stan Dard Way of encoding. But note one thing, when we are talking about some 中文版 characters, those two standard the are, it same
U-00000000-u-0000007f:0xxxxxxx
Some
bytes, representing 21,886 characters.Range: High byte from 81 to Fe, low byte from 40 to FE.GB18030CharacterFunction: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.UCSCharacterRole: The Internat
appear garbled? It is because the sender and the recipient are using different encoding methods.It can be imagined that if there is an encoding, all the symbols in the world are included. Each symbol is given a unique character code, then the garbled problem disappears. This is Unicode, as its name indicates, which is an encoding of all symbols.Unicode is also a character encoding method. The scientific name for Unicode is "Universal multiple-octet Coded Character Set", referred to as
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.