Before starting this article, I've already made a distinction between Unicode encoding (that is, code point) and Unicode encoding implementation. Otherwise, you will have no sense in the following.
History
We know that the ISO 10646 committee defines a super character set called Universal Character Set (UCS) to encompass all the writing systems in the world. Because the UCS is now encoded in 4 bytes, it is
A very practical articleArticleFor character encoding, reprinted as a favorites.
-=== Reference original content ===-Author: Ruan YifengLink: http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
At noon today, I suddenly wanted to figure out the relationship between Unicode and UTF-8, so I began to l
At noon today, I suddenly wanted to figure out the relationship between Unicode and UTF-8, so I began to look up information online.
As a result, this problem is more complicated than I thought. After lunch, we can see that the problem is fixed at AM.
Below are my notes, mainly used to sort out my own ideas. However, I try to make it easy to understand and hope it can be useful to other friends. After all,
ArticleSource http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
At noon today, I suddenly wanted to figure out the relationship between Unicode and UTF-8, so I began to look up information online.
As a result, this problem is more complicated than I thought. After lunch, we can see that the problem is fixed at AM.
Below are my notes, ma
From:
Http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
At noon today, I suddenly wanted to figure out the relationship between Unicode and UTF-8, so I began to look up information online.
As a result, this problem is more complicated than I thought. After lunch, we can see that the problem is fixed at AM.
Below are my notes, mainly used to sor
multiple bytes. For example, the common encoding method for simplified Chinese is gb2312, which uses two bytes to represent a Chinese character. Therefore, it can theoretically represent a maximum of 256x256 = 65536 characters.
The issue of Chinese encoding needs to be discussed in a specific article. This note does not cover this issue. It is only pointed out that although multiple bytes are used to repr
sometimes garbled ? This may be the page declaration encoding is inconsistent with the file itself, more often in the wrong code to open the page and then save the result, or the use of some FTP software directly online modification of the file, such as CuteFTP, due to the software encoding configuration errors caused by the conversion of the wrong
In the past two days, I took the time to summarize/sort out the actual encoding methods and usage of various encodings in Java applications. I will record them here for future reference. In order to form a complete understanding and in-depth understanding of text encoding, in order to deal with various problems encountered during Java development, especially the garbled problem, I think it is better to make
What is the difference between Unicode, UTF-8, and iso8859-1?Will take "Chinese" two words as an example, by looking at the table can know its GB2312 code is "d6d0 CEC4", Unicode Encoding "4e2d 6587", UTF code is "E4b8ad e69687". AttentionThese two words are not iso8859-1 encoded, but can be "represented" by iso8859-1
configuration file My.ini or my.cnf,mysql best UTF8 encoding [MySQL] default-character-set=utf8 [mysqld] Default-character-set=utf8 Default-storage-engine=myisam under [mysqld] Add: Default-collation=utf8_bin init_connect= ' SET NAMES UTF8 '
2. In the need to do database operation of the PHP program before adding mysql_query ("Set Names ' code ')", Encoding and PHP code consistent, if the PHP code is gb231
has strongly demanded the emergence of a unified coding method. UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 (characters in two-byte or four-byte notation) and UTF-32 (characters in four-byte notation), but not on the Internet. Again, the relationsh
Unicode and Utf-8 in Python
The history of the character set mentioned in this article is a brief explanation of the relationship between Unicode and Utf-8, briefly summarizing:Utf-8 and Utf-16,
theoretically represent a maximum of 256x256 = 65536 characters. The issue of Chinese encoding needs to be discussed in a specific article. This note does not cover this issue. It is only pointed out that although multiple bytes are used to represent a symbol, the Chinese character encoding of the GB class has nothing to do with the Unicode and UTF-
.utf-8The popularization of the Internet has strongly demanded the emergence of a unified coding method. UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 and UTF-32, but they are largely unused on the Internet
. Therefore, we need to convert the server characters in JSON format. For PHP, there are now two open source project JSON-PHP and PHP-JSON.JSON-RPC is an RPC protocol in JSON format, which can be easily used in Ajax projects. json-rpc.org is an open source implementation.
Reference 2:
5 unicode encoding5.1 usage
The standard used to encode all characters.5.2 Overview
The industrial standard that encodes all languages in the world can represent about 1 million different symbols.
The latest standa
emergence of a unified coding method. UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 (characters in two-byte or four-byte notation) and UTF-32 (characters in four-byte notation), but not on the Internet. Again, the relationship here is that
emergence of a unified coding method. UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 (characters in two-byte or four-byte notation) and UTF-32 (characters in four-byte notation), but not on the Internet. Again, the relationship here is that
Turn: http://www.utf.com.cn/article/s1383
These related things are not complicated, but they are very easy to tell, especially recently I have read some of theseArticleEven if it is regarded as the source of authority, conflicts often occur, and the words are inaccurate and the concepts of interpretation are unclear:
1. the character set and encoding scheme are mixed. The http://www.utf.com.cn/article/s320 says:
Utf_8 Character Set
code unit size is equivalent to the number of digits in a specific encoding method:UTF-8: code units in a UTF-8 are composed of eight digits; In a UTF-8, each code point is often mapped to multiple code units because of small cod
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.