1. Basic computer knowledge (three pictures)
2. How the text editor accesses the file (Nodepad++,pycharm,word)
Opening the editor opens a process that is in memory, so content written in the editor is also stored in memory, and data is lost after a power outage
So you need to save to your hard drive and click the Save button to swipe the data from memory to your hard drive.
At this point, we write a py file (no execution), no different from writing other files, just writing a bunch of characters.
3 . How the Python interpreter executes the py file, such as Python test.py
1. The first stage, the Python interpreter starts, is equivalent to launching a text editor.
2. The second stage, the Python interpreter equivalent to the text editor, to open the test.py file, from the hard disk to read the contents of the test.py file into memory (Small review: Pyhon interpretation, decided that the interpreter only care about the contents of the file, do not care about the file suffix name)
3. Phase three, execute the file code that was just loaded in memory
Summary:
The similarities and differences between the Python interpreter and the text editor:
The same point:the Python interpreter interprets the execution file contents, so the Python interpreter has the ability to read the Py file, which is the same as the text editor
Different points: The text editor is stored in memory for display, editing, and Python interpreter for execution.
Two what is character encoding
The computer can only read 0,1 that is binary number, how to let the computer read human characters, this will pass through a conversion process, that is, the character-------Digital
This process can be called as character encoding
The history of three-character coding
Phase one; Modern computers originated in the United States, and the earliest birth was also based on the English-considered ASCII
ASCLL: A Bytes with a table one character, a bytes=8bit 8bit can represent 0-2**8-1 changes, that can represent 256 characters
Stage two: In order to satisfy Chinese, the Chinese have customized the GBK
Gbk:2bytes represents a character
In order to satisfy other countries, each country has to customize its own code
Japan put the Japanese Shift_JIS in, South Korea to the Korean Euc-kr in the
That's when it happens. Unicode Unified 2Bytes for one character, 2**16-1=65535, which can represent more than 60,000 characters, thus compatible with the universal language
But for texts that are all English-language, this encoding is undoubtedly one-fold more storage space (the binary is ultimately stored in the storage medium in the form of electricity or magnetism)
Thus produced the UTF-8, the English characters only with 1Bytes, the Chinese characters with 3Bytes
One thing to emphasize is:
Unicode: Simple rough, all characters are 2Bytes, the advantage is the character---digital conversion speed, the disadvantage is the space-occupying large
Utf-8: precision, for different characters with different lengths, the advantage is to save space, the disadvantage is: character---number conversion speed is slow, because each time you need to calculate how long the character needs bytes to be able to accurately represent
Summary
1. The encoding used in memory is Unicode, with space to change time
2. Use Utf-8 in the hard drive or network, stable (small space)
Five-character encoding using the 5.1 text editor Yiguoduan Unicode--->encode-->utf-8utf-8-->decode-->unicode
Summarize:
No matter what the editor, to prevent garbled files (please note that the file stored in a piece of code is just a normal file, here refers to the file is not executed before we open the file when the garbled)
The core rule is that what code the file is stored in, and how it's coded to open it.
The file test.py is saved in GBK format with the following contents:
x= ' Forest '
Whether it is
Python2 test.py
Still is
Python3 test.py
will be error (because python2 default ascii,python3 default Utf-8)
Unless you specify #coding:gbk at the beginning of the file
5.2 Execution of the program
Python test.py (I'll emphasize again that the first step in executing test.py must be to read the contents of the file into memory first)
Phase one: Start the Python interpreter
Stage two: The Python interpreter is now a text editor responsible for opening the file test.py, which reads the contents of the test.py from the hard disk into memory
At this point, the Python interpreter reads the first line of the test.py, #coding: Utf-8, to determine what encoding format to read into memory, this line is to set the Python interpreter this software encoding using the encoding format this code,
Phase three: Reads the code that has been loaded into memory (Unicode encoded binary), then executes, and may open up new memory space during execution, such as x= "Egon"
The encoding of memory uses Unicode, which does not mean that all memory is Unicode encoded in binary,
Before the program executes, the memory is indeed Unicode encoded binary, such as reading from the file a line x= "Egon", where the X, equals, quotes, status are the same, all ordinary characters, are in Unicode encoded binary form stored in memory
However, in the course of execution, the program will apply for memory (and the memory of the program code is two spaces), can be stored in any encoded format data, such as x= "Egon", will be recognized as a string by the Python interpreter, will request memory space to hold "Hello", and then let X point to the memory address, At this time the memory address of the new application is also Unicode encoded Egon, if the code is replaced with x= "Egon". Encode (' Utf-8 '), then the new application memory space is UTF-8 encoded string Egon.
python file processing
Fifth: Python-based character encoding