Let's first explain why we need to convert Chinese to unicode encoding. Unicode plays an important role in general international standards. It is more byte-saving than traditional character encoding, enabling the design of web pages to be displayed on platforms of different languages, therefore, as long as the Chinese character is converted to Unicode, no garbled
UTF code
The UTF-8 is to encode the UCS in 8-bit units. The encoding method from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16-in-system)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 between 0800-FFFF, so be sure to use the 3-byte
UTF encoding
The UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16 binary)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 is between 0800-ffff, so I'm sure to use a 3-byte te
One-encoding history single-byte encoding
2.1.1 ASCII 0-127 7-bit representation2.1.2 ASCII extended code 0-255 8-bit representationCode Page: use the code page to switch the corresponding
Multi-byte encoding
2.1.3 dual-Byte Character Set DBCSOne or two bytes are used to represent characters."Country A and country B"12 1 2A: 0x41 medium: 0x8051B: 0x42 countries: 0x8253
1 2 3 4 5 60x41 0x80 0x51 0x42 0x82 0x53 A in Country B
In this way, multi-byte enc
Phputf-8 to unicode functions page 12th. The UTF encoding UTF-8 is coded in 8 bits. The encoding method from UCS-2 to UTF-8 is as follows: UCS-2 encoding (hexadecimal) UTF-8 byte stream (binary) bytes -007f0xxxxxxx0080-07ff UTF encoding
The UTF-8 is coded in 8 bits. The encoding from UCS-2 to UTF-8 is as follows:
UCS-
parameter. The p m u l t I B y t e S t R parameter is used to set the string to be converted. The C h m u l t I B y t e parameter is used to specify this characterThe length of a string (in bytes ). If the parameter C h m u l t I B y t e is passed-1, this function is used to determine the length of the source stringDegree.The converted u n I c o d e version string will be written to the cache in the memory. Its address is specified by the P wi d e c h a R S T R parameterYes. The maximum value o
Unicode, ucs-2, ucs-4, UTF-16, utf-32, UTF-8
Unicode details
Copyright Notice: It can be reproduced at will, but the original author charlee and original link http://tech.idv2.com/2008/02/21/unicode-intro/must be indicated in a timely manner.
Maybe everyone has heard of Unicode
Notes on studying the Unicode Character Set in Windows programming:
1: The C language supports Unicode through support for wide character sets
2: The wide character in C is based on the wchar_t data type. It includes wchar in several header files. H is defined as follows: typedef unsigned short wchar_t; therefore, the wchar_t data type is the same as the unsigned
Speaking of Symbian development, we have to mention the depressing descriptor in Symbian. Symbian introduces a series of mechanisms to improve stability, and descriptors are also one of them. From tdesc to rbuf, from the 16-bit Unicode to the 8-bit UTF-8, the unclear relationship between the chaos not only makes the new Symbian headache, but also in the Symbian Master feel ashamed.
Compared with Symbian descriptors, QT can be a big comfort for charac
The path in Windows is a backslash \, but the backslash \ has the meaning of escape characters in Python, so when writing Windows file paths in a py file, pay special attention to the use of backslashes.Here are three ways to solve the problem: 1 way one: escaped by 2 3 c:\\users\xxx\desktop\a.txt " Span style= "COLOR: #008080" > 4 5 Mode two: explicitly declaring a string without escaping 6 7 r" c:\users\xxx\desktop\a.txt " 8
Learning python, using os.path.getsize (' c:\user_weblogic.dmp ') to get the size of a file, try a few, can produce results.
As a result, when you try this file, you always find an error:
SyntaxError: (Unicode error) ' Unicodeescape ' codec can ' t decode bytes in position 2-3: truncated \uxxxx escape
The same is true for a awrrpt.html file name ....
It was due to the existence of \u, which led to the e
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.