python--character encoding, word processing, functions

Source: Internet
Author: User
Tags function definition

Understanding the Knowledge reserve of character encoding

Our daily use of the text editor has Nodepad++,pycharm,word, and so on, with their access to files, the process is similar, need to know that the open editor opens a process, is in memory, so the editor is written in the content is also stored in memory, the loss of data after the outage.

To prevent data loss we usually click Save, the process of saving is actually from the memory of the data to the hard disk.

At this point, we write a py file (no execution), no different from writing other files, just writing a bunch of characters.

how the Python interpreter executes the py file, such as Python

First stage: The Python interpreter starts, which is equivalent to launching a text editor

Second stage: The Python interpreter is equivalent to a text editor to open the file and read the contents of the file into memory from the hard disk

Phase three: The Python interpreter interprets the code that executes the that was just loaded into memory

The Python interpreter interprets the contents of the executable file, so the Python interpreter has the ability to read the Py file, as is the case with a text editor

Unlike a text editor, the Python interpreter can read not only the contents of the file, but also the contents of the file.

What is character encoding

We know that the computer is based on high and low levels of work, high and low level is the binary number 1, the lower level is the binary number 0, the purpose of programming is to let the computer work, and the result of programming is just a bunch of characters, that is what we programmed to achieve is: a bunch of characters to drive the computer to work

So you have to go through a process:

Character--------(translation process)-------> numbers

This process is actually how a character corresponds to the standard of a particular number, which is called a character encoding.

The history of character coding phase one: Modern computers originated in the United States, and the earliest birth was also based on the English-considered ASCII

ASCII: A bytes represents one character (English characters/all other characters on the keyboard), 1bytes=8bit,8bit can represent 0-2**8-1 variations, which can represent 256 characters

ASCII originally used only the last seven digits, 127 digits, has been fully able to represent all the characters on the keyboard (English characters/keyboard all other characters)

Later, in order to encode the Latin into the ASCII table, the highest bit is also occupied

Stage two: In order to satisfy Chinese, the Chinese have customized the GBK

Gbk:2bytes represents a character

In order to satisfy other countries, each country has to customize its own code

Japan put the Japanese Shift_JIS in, South Korea to the Korean Euc-kr in the

Stage Three: countries have national standards, there will inevitably be conflicts, the result is that in the multi-language mixed text, the display will be garbled.

The resulting Unicode, unified 2Bytes for a character , 2**16-1=65535, can represent more than 60,000 characters, thus compatible with the universal language

But for texts that are all English-language, this encoding is undoubtedly one-fold more storage space (the binary is ultimately stored in the storage medium in the form of electricity or magnetism)

Thus produced the UTF-8, the English characters only with 1Bytes, the Chinese characters with 3Bytes

One thing to emphasize is:

Unicode: Simple rough, all characters are 2Bytes, the advantage is the character---digital conversion speed, the disadvantage is the space-occupying large

Utf-8: precision, for different characters with different lengths, the advantage is to save space, the disadvantage is: character---number conversion speed is slow, because each time you need to calculate how long the character needs bytes to be able to accurately represent

The encoding used in memory is Unicode, with space-time (the program needs to be loaded into memory to run, so the memory should be as fast as possible)

In the hard disk or network transmission with UTF-8, network I/O latency or disk I/O latency is much larger than the utf-8 conversion delay, and I/O should be as much as possible to save bandwidth, ensure the stability of data transmission.

Character encoding classification

The computer was invented by the Americans, the earliest characters encoded as ASCII, only the English alphanumeric and some special characters and the corresponding relationship between the numbers. Can be represented at most 8 bits (one byte), that is: 2**8 = 256, so the ASCII code can only represent a maximum of 256 symbols

ASCII uses 1 bytes (8-bit binary) to represent one character

Unicode commonly used 2 bytes (16-bit binary) represents a character, the uncommon Word needs 4 bytes

This time the garbled problem disappears, all the documents we use but the new problem arises, if all our documents are English, you can use Unicode more space than ASCII, the storage and transmission is very inefficient

In the spirit of saving, there has been the conversion of Unicode encoding to "Variable length encoding" UTF-8 encoding. The UTF-8 encoding encodes a Unicode character into 1-6 bytes according to a different number size, the commonly used English letter is encoded in 1 bytes, the kanji is usually 3 bytes, and only the very uncommon characters are encoded into 4-6 bytes. If the text you want to transfer contains a large number of English characters, you can save space with UTF-8 encoding:

Character encoding using the text editor Yiguoduan


No matter what the editor, to prevent garbled files (please note that the file stored in a piece of code is just a normal file, here refers to the file is not executed before we open the file when the garbled)

The core rule is that what code the file is stored in, and how it's coded to open it.

Execution of the program

Phase one: Start the Python interpreter

Stage two: The Python interpreter is now a text editor responsible for opening the file, which reads the contents of the from the hard disk into memory

At this point, the Python interpreter reads the first line of the, #coding: Utf-8, to determine what encoding format to read into memory, this line is to set the Python interpreter this software encoding using the encoding format this code,

Can be viewed with sys.getdefaultencoding (), if you do not specify the header information #-*-coding:utf-8-*-in the Python file, then use the default

Default usage in Python2 in Ascii,python3 utf-8

Phase three: Reads the code that has been loaded into memory (Unicode encoded binary), then executes, and may open up new memory space during execution, such as x= "Egon"

The encoding of memory uses Unicode, which does not mean that all memory is Unicode encoded in binary,

Before the program executes, the memory is indeed Unicode encoded binary, such as reading from the file a line x= "Egon", where the X, equals, quotes, status are the same, all ordinary characters, are in Unicode encoded binary form stored in memory

However, in the course of execution, the program will apply for memory (and the memory of the program code is two spaces), can be stored in any encoded format data, such as x= "Egon", will be recognized as a string by the Python interpreter, will request memory space to hold "Hello", and then let X point to the memory address, At this time the memory address of the new application is also Unicode encoded Egon, if the code is replaced with x= "Egon". Encode (' Utf-8 '), then the new application memory space is UTF-8 encoded string Egon.

For python3 such as

When you browse the Web, the server converts dynamically generated Unicode content to UTF-8 and then to the browser

If the encoding format of the server-side encode is utf-8, the client in-memory receives the UTF-8 encoded binary as well.

The difference between Python2 and Python3 is that there are two types of strings in Python2 str and Unicode

STR type

When the Python interpreter executes the code that produces the string (for example, s= ' forest '), it requests a new memory address and then encode the ' forest ' to the encoding format specified at the beginning of the file, which is already the result of encode, so s can only decode

So the important point is:

In Python2, STR is the encoded result bytes,str=bytes, so in python2, the result of Unicode character encoding is str/bytes

Unicode Type

When the Python interpreter executes the code that produces the string (for example, S=u ' forest '), it requests a new memory address and then stores the ' Forest ' in Unicode format in the new memory space, so s can only encode and cannot be decode

Print to Terminal

Special instructions for print are:

When the program is executed, such as

x= ' Forest '

Print (x) #这一步是将x指向的那块新的内存空间 (not the memory space in which the code resides) is printed to the terminal, and the terminal is still running in memory, so this printing can be understood as printing from memory to memory, that is, memory,unicode-> Unicode

For data in Unicode format, no matter how it is printed, it is not garbled.

The string in python3 and the U ' string ' in Python2 are Unicode, so printing is not garbled anyway.

In the Pycharm

In the Windows terminal

However, there is another non-Unicode string in the Python2, at this time, print x, will be executed according to the terminal Code x.decode (' Terminal code '), after the Unicode, and then print, when the terminal encoding and the file at the beginning of the specified encoding inconsistent, garbled generated

In Pycharm (the terminal code is utf-8, the file is encoded as UTF-8, it is not garbled)

In Windows terminal (Terminal encoded as GBK, file encoded as Utf-8, garbled generated)

There are also two kinds of string types in Python3 str and bytes

STR is Unicode

Bytes is bytes.

--------------------------------------------------------------------------------------------------------------- ----------------------

Functions in Python

function definitions in Python: functions are a programmatic method of logical structuring and process.

function definition methods in Python:

def Test (x):     " The function Definitions "     x+ =1    return x     def: Defines the function's keyword test: function name (): Inside definable parameter "": Document description (not necessary, but strongly recommend adding descriptive information for your function) x+ =1  : Refers to the code block or program processing logic return: Define return value call run: can take parameters or without the function name (


1. Functions in programming languages are distinct from those of mathematical meanings, and the function in a programming language is to encapsulate a string of logic used to accomplish a particular function by means of a functional name.

2. Functional programming is: first define a mathematical function (mathematical modeling), and then follow the mathematical model in the programming language to implement it. As for the benefits of how to do this and how to do it, look at the following functional programming.

Why use a function

Summarize the benefits of using a function:

1. Code Reuse

2. Maintain consistency and ease of maintenance

3. Extensibility

Functions and procedures

Procedure definition: A procedure is simply a function with no return value

 def test01 (): Msg  = " hello the Little Green frog  "   print msg def test02 (): Msg  = " hello wudalang  "   print msg 
    return   msg t1  = test01 () T2  =test02 () print  "  from test01 return is [%s]   ' %

Summary: The Python interpreter implicitly returns none when a function/procedure does not return a value using the definition returned by return.

So in Python that is, the process can also be counted as a function.

def test01 (): Pass Def test02 ():return 0def test03 ():return 0,Ten,'Hello',['Alex','lb'],{'Wudalang':'lb'} T1=test01 () T2=test02 () T3=test03 () print'From test01 return is [%s]:'%type (t1), T1print'From test02 return is [%s]:'%type (t2), T2print'From test03 return is [%s]:'%type (T3), T3


Number of return values = 0: Return None

Number of return values = 1: Return object

Number of return values >1: return tuple

function parameters

1. Parametric the memory unit is allocated only when called, releasing the allocated memory unit immediately at the end of the call. Therefore, the formal parameter is only valid inside the function. Function call ends when you return to the keynote function, you can no longer use the shape parametric

2. Arguments can be constants, variables, expressions, functions, and so on, regardless of the type of argument, and when making a function call, they must have a definite value in order to pass these values to the parameter. It is therefore necessary to use the assignment, input and other methods to get the parameters to determine the value

3. Positional parameters and keywords (standard call: Real participation parameter position one by one corresponds; keyword call: position without fixing)

4. Default parameters

5. Parameter groups

Built-in functions

python--character encoding, word processing, functions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.