Python basic binary and character encoding

Source: Internet
Author: User
Tags decimal to binary

Binary definition

Binary system is widely used in computing technology. Binary data is a number represented by 0 and 12 digits. Its base is 2, the rounding rule is "every two in one", the borrow rule is "borrowing one as two", which was found by the 18th century German mathematical philosophy master Leibniz. The current computer system is basically binary systems, the data in the computer is mainly in the form of complement storage. The binary in the computer is a very small switch, with "on" to indicate 1, "off" to represent 0.

Binary and decimal conversions

We have found that the decimal value represented by the nth digit of the binary is exactly the same as the 2 n-th square.

Character encoding

Through the knowledge of the binary, we already know that the computer only know the binary system, the number of life in order to make the computer understand it must be converted into binary. Decimal to binary conversion can only solve the problem of the computer to understand the number, then how to make the text computer understanding it?

So we chose a curve to the salvation of the way, since the number can be converted to decimal, we just try to convert the text into numbers, so that the text can be expressed as binary?

Agreed a table, the text and numbers correspond, this table is equivalent to translation, we can hold a number to compare the corresponding table to find the corresponding text, and vice versa.

ASCII code

What if we already have a watch like this?

  

ASCII (American Standard Code for information Interchange, US Information Interchange standards codes) is a computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages. It is now the most versatile single-byte encoding system and is equivalent to ISO/IEC 646.

Since the computer was invented by the Americans, only 127 letters were encoded into the computer, that is, letters, numbers, and symbols, which are called ASCII encodings, such as uppercase letters encoded in A 65 lowercase letters z 122 . The latter 128 are called extended ASCII codes.

Now we know that the letters and numbers on the above table are already there. So, according to some decimal now, we can convert the binary encoding string.

GBK and GB2312

Obviously, it is important for us to be able to display Chinese characters in the computer, but the ASCII table just learned does not even have a single radicals. So we also need a relational table on Chinese and numbers. As we have seen before, a byte can only represent a maximum of 256 characters, it is not enough to deal with Chinese, so we need to use two bytes to represent it and not to conflict with ASCII encoding, so China has GB2312 code to make it into Chinese.

Unicode

Unicode emerged. Unicode unifies all languages into a set of encodings, so there is no more garbled problem.

The Unicode standard is also evolving, but it is most commonly used to represent a character in two bytes (4 bytes If a very remote character is used). Unicode is supported directly by modern operating systems and most programming languages.

Now, the difference between ASCII encoding and Unicode encoding is smoothed:

ASCII encoding is 1 bytes, while Unicode encoding is usually 2 bytes.

The letter A with ASCII encoding is decimal 65, binary 01000001;

The character 0 is encoded in ASCII with the decimal 48, the binary 00110000;

The Chinese character "medium" has exceeded the ASCII encoding range, Unicode encoding is decimal 20013, binary 01001110 00101101.

You can guess that if you encode ASCII-encoded A in Unicode, you only need to make 0 on the front, so the Unicode encoding for A is 00000000 01000001.

UTF-8

In the spirit of saving, the UTF-8 encoding of converting Unicode encoding into "Variable length code" has appeared. The UTF-8 encoding encodes a Unicode character into 1-6 bytes according to a different number size, the commonly used English letter is encoded in 1 bytes, the kanji is usually 3 bytes, and only the very uncommon characters are encoded into 4-6 bytes. If the text you want to transfer contains a large number of English characters, you can save space by coding with UTF-8.

  

Python basic binary and character encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.