The learning journey of Python ——— basic data type (character encoding)

Source: Internet
Author: User

The information stored in the computer is represented by a binary number. Popularly speaking, according to what rules to store characters in the computer, such as ' n ' with what is called "coding", conversely, the binary number stored in the computer is resolved to display, called "decoding." In the decoding process, if the use of the wrong decoding rules, it will cause ' n ' parsing to ' m ' or garbled.

A character encoding declaration is used, and all source code files in the same project use the same character encoding declaration.

This is something that must be done.

Python3 discards str, all using Unicode.

Press the quotation mark before the first you do it really is very unaccustomed and often forget to run back to fill.

When the computer reaches every country in the world,

In order to solve the problem of garbled, a great creative thought produced--unicode. The Unicode encoding system is designed to express any character of any language. It uses 4-byte numbers to express each letter, symbol, or ideographs (Ideograph). Each number represents the only symbol that is used in at least one language. (Not all numbers are used, but the total is more than 65535, so a 2-byte number is not enough.) Characters that are common to several languages are usually encoded using the same number, unless there is a justification for etymology (etymological) to do so. Regardless of this case, each character corresponds to a number, and each digit corresponds to one character. That is, there is no ambiguity. It is no longer necessary to record "mode". u+0041 always represents ' a ', even if the language does not have the ' a ' character.

In the field of computer science,Unicode( Uniform Code , Universal code , single code , standard Universal Code ) is a standard in the industry, It allows the computer to embody the system of dozens of words in the world. Unicode is developed based on the standard of the universal Character set (Universal Character set) and is also published in the form of books [1]. Unicode is also constantly expanding, with each new version inserting more new characters. Up to the sixth edition so far, Unicode has included more than 100,000 characters (in 2005, the 100,000th character of Unicode has been adopted and recognized as one of the criteria), a set of code diagrams that can be used as a visual reference, a set of encoding methods and a set of standard character encodings, a set containing superscript words, Enumeration of character attributes such as subscript characters. The Unicode organization (the Unicode Consortium) is operated by a non-profit organization and dominates the subsequent development of Unicode, with the goal of replacing the existing character encoding scheme with a Unicode encoding scheme, especially if the existing scheme is in a multilingual environment , there are only limited spaces and incompatible issues.

The learning journey of Python ——— basic data type (character encoding)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.