Some time ago, in the participating projects encountered a Unicode and GB transcoding failure problem, some of the characters are not commonly used encoding has been translated into the "??", the Chinese characters did not show up, so they did some research on the related problems and finally solved the problem. Now, combining the previous two Unicode and GB fundamentals, this paper introduces the method of
The coding aspect has always been not very high, so it is not known about Unicode and UTF-8.
Recently accidentally turned to a UTF-8 article, feel the explanation of the very complex, so just thought to write a simple and understandable.
Let's begin by explaining some of the coding schemes that are now commonly used:1, in China, the most commonly used in the mainland is GBK18030 code, in addition to the gbk,gb2312, the relationship between these sev
ArticleDirectory
Unicode compilation settings:
UNICODE: Wide-Byte Character Set
Development Process:
1. Regular Expressions matching Unicode characters
Original article: http://blog.sunmast.com/Sunmast/archive/2004/07/30/799.aspx
Here are several main non-English character ranges (found on Google ):
2e80 ~ 33ffh: Symbol area of China,
Q How to display Unicode strings
A
If the program defines _ Unicode macro, directly use
Wchar * STR = l "unicodestring ";
Textout (0, 0, STR );
Otherwise, the conversion type is required.
# Include Wchar * STR = l "unicodestring ";
Bstr_t str1 = STR;
Textout (0, 0, (char *) str1 );
Q how to convert ANSI and UnicodeAConvert ANSI to Unicode(1) Use the macro L, fo
compatible with ASCII encoding code, in fact, the use of extended ASCII is not really standardized this point, A Chinese character is represented by two extended ASCII characters to differentiate the ASCII portion.But this method has the problem, the biggest problem is the Chinese text encoding and the extended ASCII code has the overlap. Many software use the extended ASCII English tab to draw the table, such software used in the Chinese system, these tables will be mistaken as Chinese charact
really standardized this point, A Chinese character is represented by two extended ASCII characters to differentiate the ASCII portion.But this method has the problem, the biggest problem is the Chinese text encoding and the extended ASCII code has the overlap. Many software use the extended ASCII English tab to draw the table, such software used in the Chinese system, these tables will be mistaken as Chinese characters, garbled.In addition, because countries and regions have their own text cod
Convert Unicode string to Chinese in Python3When crawling data with Python crawlers, it is sometimes found that crawling data is similar to"\U3010\U6F14\U5531\U4F1A\U30112000-\U62C9\U9614\U97F3\U4E50\U4F1A" Such a Unicode string, in the interactive environment of Python can be directly printed output to view the content;Print ("\u3010\u6f14\u5531\u4f1a\u30112000-\u62c9\u9614\u97f3\u4e50\u4f1a") "Concert " 2
As we know, C uses a char data type to represent a 8-bit ANSI character, and by default when a string is declared in the code, the C compiler converts the characters in the string into an array of 8-bit char data types:
Copy Code code as follows:
An 8-bit character
char c = ' A ';
An array of 8-bit character and 8-bit terminating zero
Char szbuffer[100] = "A String";
Microsoft's C + + compiler defines a built-in data type wchar_t, which represents a 16-bit
Before starting this article, I've already made a distinction between Unicode encoding (that is, code point) and Unicode encoding implementation. Otherwise, you will have no sense in the following.
History
We know that the ISO 10646 committee defines a super character set called Universal Character Set (UCS) to encompass all the writing systems in the world. Because the UCS is now encoded in 4 bytes, it is
I searched for the Unicode range of Chinese characters on the Internet and generally gave "U + 4e00 .. u + 9fa5 ". But in fact, this range is incomplete, and even the basic fullwidth (Chinese) punctuation is not included. Based on the latest Unicode 5.0 version, sort it out as follows:
Note: In most application scenarios, we can use only (1), (2), (3), (4), (5) sets as the basis for CJK judgment.
1) Standar
Char is the underlying type of Java (the original type) and is a character type. Characters in Java are Unicode-encoded, so a Java character occupies 2 bytes, and the content of the character is stored in Unicode code values (binary numbers). The question is, how does the program convert Unicode code values to the program data we want? For example: Chinese charac
The string also has an encoding problem.Because a computer can only handle numbers, if you are working with text, you must convert the text to a number before processing it. The oldest computer was designed with 8 bits (bit) as a byte (byte), so the largest integer that a Word energy saver represents is 255 (binary 11111111 = decimal 255), and 0-255 is used to denote uppercase and lowercase letters, numbers, and some symbols. This Code table is called ASCII encoding, such as the code for capital
develop a long-term vision, then no matter what you set the encoding method, will not make the data generated garbled. Because, here is the universal code--unicode.
Well, the question is, how do we solve it? By experimenting, Jackson JSON actually has the ability to parse Unicode-encoded JSON data with the default settings. What is missing is the lack of steps to serialize the object. Fortunately, the Jac
http://blog.csdn.net/thl789/article/details/7506133Https://zhuanlan.zhihu.com/p/23654187?refer=dreawerHttp://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
UTF-8UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that encodes each character with one to four bytes:128 ASCII characters with a Unicode range
Javascript has built-in conversion functions: String.fromcharcodeat (), string.charcodeat () for converting between strings and encodings.
So what we're going to do with the string is actually very simple, we can get the Unicode encoding of the string directly using the String.charcodeat () function, where the number is decimal, and we number.tostring () the method, The 16-in-binary Unicode encoding of the
The code is very simple, but the function is very practical, recommended to everyone.
Code First:
Copy Code code as follows:
Public Function Utf8_unicode ($STR) {
$unicode = Array ();
$values = Array ();
$lookingFor = 1;
for ($i = 0; $i $thisValue = Ord ($str [$i]);
if ($thisValue Exclude 0-9
if ($thisValue >= ord (' 0 ') $thisValue Number
$unicode [] = Chr ($thisValue);
}
else {
$
Unicode Environment SettingsWhen installing Visual Studio, you must add the Unicode option when selecting VC ++ to ensure that the relevant library files can be copied to system32.
Unicode compilation settings:C/C ++, Preprocessor difinitions remove _ MBCS, add _ Unicode, UnicodeSet entry to wwinmaincrtstartup in proje
Output a Unicode code corresponding to the Chinese character and a Chinese character corresponding to the Unicode encoding.#-*-coding=utf-8-*-STR1 ="\u6000"#Unicode code for a Chinese characterStr2='a'#Kanji One#Print (Str1.encode (' Utf-8 '). Decode (' Unicode_escape ')) #python2的写法, Python3 not availablePrint(STR1)#output str1 corresponding to Chinese character
there is a Unicode encoding, it is only a representation of the rules, does not correspond to the specific implementation of the form. uni-This prefix in English means the meaning of unity, it attempts to express the language of the world in a unified code, but Unicode only specifies the binary data for the character, but does not specify that the binary data is stored in memory with a few bytes, and then
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.