Python character encoding (DAY10)

Source: Internet
Author: User

1. How the Python interpreter executes the py file, such as Python test.py

First stage: Thepython interpreter starts , which is equivalent to launching a text editor

The second stage: the Python interpreter equivalent to the text editor , to open the test.py file, from the hard disk to read the contents of the test.py file into memory (Small review: Pyhon interpretation, decided that the interpreter only care about the contents of the file, do not care about the file suffix name)

Phase three: ThePython interpreter interprets the code that executes just loaded into memory test.py (PS: In that phase, when executed, the Python syntax is recognized, execution of the in-file code executes to Name= "Egon", which will open up memory space to hold the string " Egon ")

2. Unicode, UTF-8

  The origin of 2.1 Unicode, unified 2Bytes for a character, 2**16-1=65535, can represent more than 60,000 characters, thus compatible with the universal language

Function: Unicode: Simple rough, all characters are 2Bytes, the advantage is the character----the conversion speed of the number, the disadvantage is that occupy large space

 The origin of the 2.2 UTF-8, but for the entire English text, this encoding is undoubtedly one times more storage space (binary is ultimately stored in the form of electricity or magnetic storage media)

Thus produced the UTF-8, the English characters only with 1Bytes, the Chinese characters with 3Bytes

Function: Utf-8: Accurate, different characters with different lengths, the advantage is to save space, the disadvantage is: character---number conversion speed is slow, because each time you need to calculate how long the character needs bytes to be able to accurately represent

    1. The encoding used in memory is Unicode, with space-time (the program needs to be loaded into memory to run, so the memory should be as fast as possible)
    2. in the hard disk or network transmission with UTF-8, network I/O latency or disk I/O latency is much larger than the utf-8 conversion delay, and I/O should be as much as possible to save bandwidth, ensure the stability of data transmission .

Use of 2.3 character encodings

Unicode------->encode--------->utf-8

UTF-8------->decode--------->unicode

3.1 Analysis Process

Files from memory brush to hard disk operations for short files

Files read from hard disk to memory for short read files

Comments:

If you do not specify the header information #-*-coding:utf-8-*-in the Python file, use the default

Default usage in Python2 in Ascii,python3 utf-8

 

3.2 Two types of string in Python3 str and bytes

STR is Unicode

# coding:utf-8s=' forest '# When the program executes, you do not need to add u, ' Forest ' will also be in Unicode form to save the new memory space, # s can be directly encode into any encoding format s.encode('utf-8') s.encode (' GBK ' )print#<class ' str ' >

This section summarizes

One

1. What code to save and what encoding to take out

PS: Memory fixed using Unicode encoding

We can control the encoding is to the hard disk storage or based on the network transmission selection code

2. Data is first generated in memory, is Unicode format, to be transferred to bytes format

#Unicode---------->encode (utf-8)---------->bytes

Get bytes--------->decode (GBK)---------->unicode

String in 3.python3 is recognized as Unicode

The string encode in Python gets bytes

Two.

Open

1. A system call is initiated to the operating system, and the operation opens a file

2. In a Python program A value is generated that points to the operating system to open that file, and we can assign that value to an X.

Recycling Resources

1.f.close (): shut down the operating system open file, that is, recycle the operating system resources

2.del f: no need to do this because after the Python program has finished running, all the memory consumption associated with the program is automatically cleaned up

f = open (R'aaaaa.py','r', encoding='utf-8 ' )#print (F.read ())#print (F.readline (), end= ")Print  (F.readlines ()) F.close ()

Python character encoding (DAY10)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.