Fifth: Python-based character encoding

Source: Internet
Author: User

1. Basic computer knowledge (three pictures)

2. How the text editor accesses the file (Nodepad++,pycharm,word)

Opening the editor opens a process that is in memory, so content written in the editor is also stored in memory, and data is lost after a power outage

So you need to save to your hard drive and click the Save button to swipe the data from memory to your hard drive.

At this point, we write a py file (no execution), no different from writing other files, just writing a bunch of characters.

3 How the Python interpreter executes the py file, such as Python test.py

1. The first stage, the Python interpreter starts, is equivalent to launching a text editor.

2. The second stage, the Python interpreter equivalent to the text editor, to open the test.py file, from the hard disk to read the contents of the test.py file into memory (Small review: Pyhon interpretation, decided that the interpreter only care about the contents of the file, do not care about the file suffix name)

3. Phase three, execute the file code that was just loaded in memory

Summary:

The similarities and differences between the Python interpreter and the text editor:

The same point:the Python interpreter interprets the execution file contents, so the Python interpreter has the ability to read the Py file, which is the same as the text editor

Different points: The text editor is stored in memory for display, editing, and Python interpreter for execution.

Two what is character encoding

The computer can only read 0,1 that is binary number, how to let the computer read human characters, this will pass through a conversion process, that is, the character-------Digital

This process can be called as character encoding

The history of three-character coding

Phase one; Modern computers originated in the United States, and the earliest birth was also based on the English-considered ASCII

ASCLL: A Bytes with a table one character, a bytes=8bit 8bit can represent 0-2**8-1 changes, that can represent 256 characters

Stage two: In order to satisfy Chinese, the Chinese have customized the GBK

Gbk:2bytes represents a character

In order to satisfy other countries, each country has to customize its own code

Japan put the Japanese Shift_JIS in, South Korea to the Korean Euc-kr in the

That's when it happens. Unicode Unified 2Bytes for one character, 2**16-1=65535, which can represent more than 60,000 characters, thus compatible with the universal language

But for texts that are all English-language, this encoding is undoubtedly one-fold more storage space (the binary is ultimately stored in the storage medium in the form of electricity or magnetism)

Thus produced the UTF-8, the English characters only with 1Bytes, the Chinese characters with 3Bytes

One thing to emphasize is:

Unicode: Simple rough, all characters are 2Bytes, the advantage is the character---digital conversion speed, the disadvantage is the space-occupying large

Utf-8: precision, for different characters with different lengths, the advantage is to save space, the disadvantage is: character---number conversion speed is slow, because each time you need to calculate how long the character needs bytes to be able to accurately represent

Summary

1. The encoding used in memory is Unicode, with space to change time

2. Use Utf-8 in the hard drive or network, stable (small space)

Five-character encoding using the 5.1 text editor Yiguoduan Unicode--->encode-->utf-8utf-8-->decode-->unicode

Summarize:

No matter what the editor, to prevent garbled files (please note that the file stored in a piece of code is just a normal file, here refers to the file is not executed before we open the file when the garbled)

The core rule is that what code the file is stored in, and how it's coded to open it.

The file test.py is saved in GBK format with the following contents:

x= ' Forest '

Whether it is

Python2 test.py

Still is

Python3 test.py

will be error (because python2 default ascii,python3 default Utf-8)

Unless you specify #coding:gbk at the beginning of the file

5.2 Execution of the program

Python test.py (I'll emphasize again that the first step in executing test.py must be to read the contents of the file into memory first)

Phase one: Start the Python interpreter

Stage two: The Python interpreter is now a text editor responsible for opening the file test.py, which reads the contents of the test.py from the hard disk into memory

At this point, the Python interpreter reads the first line of the test.py, #coding: Utf-8, to determine what encoding format to read into memory, this line is to set the Python interpreter this software encoding using the encoding format this code,

Phase three: Reads the code that has been loaded into memory (Unicode encoded binary), then executes, and may open up new memory space during execution, such as x= "Egon"

The encoding of memory uses Unicode, which does not mean that all memory is Unicode encoded in binary,

Before the program executes, the memory is indeed Unicode encoded binary, such as reading from the file a line x= "Egon", where the X, equals, quotes, status are the same, all ordinary characters, are in Unicode encoded binary form stored in memory

However, in the course of execution, the program will apply for memory (and the memory of the program code is two spaces), can be stored in any encoded format data, such as x= "Egon", will be recognized as a string by the Python interpreter, will request memory space to hold "Hello", and then let X point to the memory address, At this time the memory address of the new application is also Unicode encoded Egon, if the code is replaced with x= "Egon". Encode (' Utf-8 '), then the new application memory space is UTF-8 encoded string Egon.

python file processing

Fifth: Python-based character encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.