One, character set and character encoding 1. Definition
The information stored in the computer is represented by a binary number, and the characters we see on the screen, such as English, Kanji, and so on, are the result of binary conversion. Popularly speaking, according to what rules to store characters in the computer, such as ' a ' with what is called "coding", conversely, the stored in the computer binary number resolution display, called "Decoding", like cryptography and decryption. In the decoding process, if the wrong decoding rules are used, it causes ' a ' to parse to ' B ' or garbled.
character (Character): is a unit of information, in a computer, a Chinese character is a character, an English letter is a character, an Arabic numeral is a character, a punctuation mark is also a character.
Character Set (Charset): is a collection of all the abstract characters supported by a system. Usually in the form of a two-dimensional table, the content and size of the two-dimensional table is determined by the user's language, which can be English, Chinese, or Arabic.
character encoding (Character Encoding): A set of rules used to pair a set of natural language characters, such as an alphabet or a syllable table, with a collection of other things, such as numbers or electrical pulses. Here we encode the characters in the character set to a specific binary number so that it can be stored in the computer. Encoding is generally the algorithm for transforming the horizontal ordinate of a two-dimensional table. That is, it is a basic technique of information processing to establish correspondence between symbol set and digital system. That is: character--------(translation process)-------> binary number
2. Commonly used character sets and character encodings
Character sets and character encodings are generally paired, such as ASCII, GBK, Unicode, UTF-8, and so on, which represent the character set and the corresponding character encoding, hereafter referred to as encoding.
3. History of character encoding
First stage: Origin, ASCII
The computer is invented by the Americans, people use American English, the characters are relatively small, so the first design a small two-dimensional table, 128 characters, named ASCII (American standard Code for information interchange). However, the 7-bit coded character set can only support 128 characters, in order to indicate that more European characters commonly used characters are extended to ASCII, the ASCII extended character set uses 8 bits (BITS) to represent one character, with a total of 256 characters. That is, it can only be represented by a maximum of 8 bits (one byte).
Phase II: GBK
When the computer to Asia, especially in East Asia, international standards were killed, roadside children casually say a word, 256 yards is not enough. As a result, China has customized the GBK. Represents a character (kanji) with 2 bytes. Other countries have also customized their own codes, such as:
Japan put the Japanese into the language Shift_JIS
, South Korea in the Korean Euc-kr
.
Phase III: Unicode
When the internet swept through the world, the geographical restrictions were broken, different countries and regions of the computer in the process of exchanging data, there will be garbled problems, and the geographical isolation of the language is similar. In order to solve this problem, a great creation thought produced the--unicode (Universal code). The Unicode encoding system is designed to express any character of any language.
Specifies that all characters and symbols are represented by a minimum of 16 bits (2 bytes), that is: 2 **16 = 65536, note: This is said to be at least 2 bytes (16 bits), possibly more.
Stage four: UTF-8
Unicode is encoded in a way that encompasses all nations, but it wastes too much storage space for characters such as English. Then there is the UTF-8, which is the compression and optimization of Unicode encoding, followed by the least representation with the fewest representation, he no longer uses a minimum of 2 bytes, but instead all the characters and symbols are categorized: the contents of the ASCII code are saved in 1 bytes, the characters in Europe are stored in 2 bytes, Characters in East Asia are saved with 3 bytes.
Add:
Unicode: Inclusive, the advantage is the character---digital conversion speed, the disadvantage is that occupy space large utf-8: accurate, different characters with different lengths, the advantage is to save space, the disadvantage is: character---number conversion speed is slow, Because each time you need to figure out how long a character needs to be bytes to accurately represent
The encoding used in memory is Unicode, with space for time, in order to quickly
Because the program needs to be loaded into memory to run, the memory should be as fast as possible.
In the hard disk or network transmission with UTF-8, network I/O latency or disk I/O latency is much larger than the utf-8 conversion delay, and I/O should be as much as possible to save bandwidth, ensure the stability of data transmission.
Because of the transmission of data, the pursuit of stability, high efficiency, the smaller the amount of data transmission is more reliable, so all turned into utf-8 format, rather than Unicode.
Such as:
4. Use of character encoding
1) How the text editor accesses the file (Nodepad++,pycharm,word)
Opening the editor opens a process that is in memory, so the content written in the editor is also stored in memory, and the data is lost after a power outage. So you need to save to your hard drive and click the Save button to swipe the data from memory to your hard drive. At this point, we write a py file (no execution), no different from writing other files, just writing a bunch of characters.
regardless of the editor, to prevent garbled files, The core rule is that what code the file is stored in, and what encoding it opens.
2) How the Python interpreter executes the py file (Python test.py)
First stage: The Python interpreter starts, which is equivalent to launching a text editor
Second stage: The Python interpreter is equivalent to a text editor to open the test.py file and read the contents of the test.py file into memory from the hard disk
Phase three: The Python interpreter interprets the code that executes the test.py that was just loaded into memory
Add:
Therefore, in writing code, in order not to appear garbled, recommended to use UTF-8, will add #-*-Coding:utf-8-*-
That
#!/usr/bin/env python#-*-coding:utf-8-*- print "Hello, World"
The Python interpreter reads the second line of the test.py, #-*-Coding:utf-8-*-, to decide what encoding format to read into memory, and this line is to set the Python interpreter encoding of the software using the encoding format.
If you do not specify the header information #-*-coding:utf-8-*-in the Python file, use the default Python2 in default Ascii,python3 in the default use Utf-8
Summarize:
1) The Python interpreter interprets the contents of the executable file, so the Python interpreter has the ability to read the Py file, as is the case with a text editor
2) Unlike a text editor, the Python interpreter can read not only the contents of the file, but also the contents of the file.
Some of the differences between 5.python2 and Python3
1) default use in Python2 in Ascii,python3 utf-8
2) Python2, STR is the encoded result bytes,str=bytes, so s can only decode.
3) The string in Python3 and the U ' string ' in Python2 are Unicode and can only be encode, so printing is not garbled anyway, because it can be understood to print from memory to memory, memory---memory, Unicode->unicode
4) Python3, STR is Unicode, when the program executes, without adding u,str will also be in Unicode form to save the new memory space, STR can be directly encode into any encoding format, S.encode (' Utf-8 '), S.encode (' GBK ')
#unicode (str)-----encode---->utf-8 (bytes) #utf -8 (bytes)-----decode---->unicode
5) The Windows Terminal encoding for Gbk,linux is UTF-8.
Ii. file operations 1. process of file processing
1) Open the file, get the file handle and assign a value to a variable
2) manipulate the file with a handle
3) Close the file
For example:
f = open (' chenli.txt ') #打开文件first_line = F.readline () print (' first line: ', first_line) #读一行data = F.read () # reads all the rest of the content, Do not use print (data) #打印读取内容f when the file is large. Close () #关闭文件
2. Basic usage of file operation
1) Basic usage:
File_object = open (file_name, Access_mode = ' r ', buffering =-1)
The Open function has a number of parameters, commonly used are file_name,mode and encoding
file_name: Open file name, if not current path, specify path
Access_mode File Open mode
Encoding indicates what encoding is used for the returned data, generally using UTF8 or GBK;
2) File open mode
- R, read-only mode "default mode, file must exist, not present, throw exception"
- W, write-only mode "unreadable; not exist" created; empty content "
- X, write-only mode "unreadable; not present, create, present error"
- A, append mode "readable; not present" create; append content only ", file pointer automatically moves to end of file.
"+" means you can read and write a file at the same time
- r+, read and write "readable, writable"
- w+, write "readable, writable", erase the contents of the file, and then open the file as read-write.
- x+, write "readable, writable"
- A +, read "Readable, writable", open the file in read-write mode, and move the file pointer to the end of the file.
"B" means to operate in bytes, opening the file in binary mode instead of in text mode.
- RB or R+b
- WB or W+b
- XB or W+b
- AB or A+b
Note: When opened in B, the content read is a byte type, and a byte type is required for writing, and encoding cannot be specified
3) Open the file as read R
#!/usr/bin/env python#-*-coding:utf-8-*-f=open (' 1.txt ', encoding= ' utf-8 ', mode= ' R ') print (f) data1=f.read () print ( DATA1)
1.txt
55542342123
Output:
<_io. Textiowrapper name= ' 1.txt ' mode= ' r ' encoding= ' Utf-8 ' >55542342123
Add:
1) Python has three methods to handle the reading of the contents of the file: Read () #一次读取全部的文件内容. ReadLine () #每次读取文件的一行. ReadLines () #读取文件的所有行, returns a list of strings. 2) Print (F.readable ()) #判断文件是否是r模式打开的
3) print (f.closed) #判断文件是否是关闭状态
4) File.seek (offset,whence=0) #从文件中给移动指针 in Python for text file content movement, offset from whence (0 start, 1 current, 2 end) offsets, moving toward the end, Move the File.tell () #返回当前文件中的位置 negative toward the beginning. Get the file pointer position
5) File.truncate (Size=file.tell ()) #截取文件到最大size个字节, defaults to the current file location
4) Write file in W mode
F=open (' A.txt ', ' W ', encoding= ' Utf-8 ') # f=open (' B.txt ', ' R ', encoding= ' utf-8 ') #以读的方式打开文件, file does not exist then error f=open (' B.txt ', ' W ' , encoding= ' Utf-8 ') # Print (F.writable ()) f.write (' 111111\n22222222 ') f.seek (0) f.write (' \n333333\n444444 ') F.writelines ([' \n55555\n ', ' 6666\n ', ' 77777\n ']) f.close ()
A.txt is empty
B.txt
33333344444455555666677777
Add:
File.write (str) #向文件中写入字符串 (text or binary) file.writelines (seq) #写入多行, write a list of strings to the file, and note that you want to add a newline character to each line File.flush () #刷新文件内部缓冲, the data of the internal buffer is immediately written to the file instead of passively waiting for the output buffer to be written.
5) file Modification
#!/usr/bin/env python#-*-coding:utf-8-*-import osread_f=open (' B.txt ', ' R ') Write_f=open ('. B.txt.swap ', ' W ') for line In Read_f.readlines (): if Line.startswith (' 1111 '): line= ' 2222222222\n ' write_f.write (line) Read_ F.close () Write_f.close () os.remove (' B.txt ') os.rename ('. B.txt.swap ', ' b.txt ')
3. Context Management with statements
When you do file processing, you need to get a file handle, read the data from the file, and then close the file handle.
Under normal circumstances, the code is as follows:
File = Open ("/tmp/foo.txt") data = File.read () file.close ()
Here are two questions. One is the possibility of forgetting to close the file handle, and the other is that the file read data is abnormal and no processing is done.
However, with can handle the exception generated by the context environment well. The following is the code with the version:
With open ("/tmp/foo.txt") as file: data = File.read ()
The basic idea of with is that the object to be evaluated with must have a __enter__ () method, a __exit__ () method. Immediately after the statement that follows with is evaluated, the __enter__ () method of the returned object is called, and the return value of the method is assigned to the variable following the AS. The __exit__ () method of the previous return object is called when all code blocks following the with are executed.
Add:
Analog Tail-f Access.log
#!/usr/bin/env python#-*-coding:utf-8-*-# tail-f access.logimport timewith open (' Access.log ', ' R ', encoding= ' utf-8 ') As F: F.seek (0,2) while True: line=f.readline (). Strip () if line: print (' Add a row of logs ', lines) Time.sleep (0.5)
Three, function 1. What is a function?
Functions are well-organized, reusable pieces of code that are used to implement a single, or associated function.
function can improve the modularity of the application, and the reuse of the code, the scalability is strong.
2. Classification of functions
There are two types of functions in Python: Built-in functions, custom functions
1) built-in functions
Python itself defines a function that can be called directly
Summaxmina=len (' Hello ') print (a) B=max ([+]) print (b)
2) Custom Functions
Self-defined functions according to the requirements of the function definition method
3. Definition of functions
1) Why do you define a function?
Defined and used, if not defined and used directly, it is equivalent to referencing a non-existent variable name
The use of a function consists of two stages: the definition phase and the usage phase
Note: Define functions, detect only syntax, do not execute code
2) syntax for function definitions
def functionname (parameters): "Function _ Document String" function_suite return [expression]
For example:
def printme():printreturn
3) Define the three forms of the function
#一: parameterless function: If the function is simply performing some operation, it is defined as an parameterless function, and the parameterless function usually does not have a return value Def print_star (): print (' # ' *6) #二: Defines the function of the function: the execution of functions depends on the parameters passed in externally , the parameter function usually has a return value # def my_max (x, y): # Res=x if x >y else y# return res# ternary expression x=10y=2# if x > y:# Print (x) # els e:# Print (y) #
res=x if x > y else y print (res)
#三: Empty function
When you start thinking about code architecture, you can write down the extension function and later refine
# def auth (): # "" "" Authentication Function "" "# pass# auth () def insert (): " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" Passdef Delete (): "" "" "" "" "" "" "" " passdef update " "" "" "" "
4) Call to function
def foo (): print (' from foo ') def Bar (name): print (' bar===> ', name) #按照有参和无参可以将函数调用分两种foo () #定义时无参, The call is also not required to pass in the parameter bar (' Egon ') #定义时有参, the call must also have the parameter # according to the function's invocation form and where it appears, divided into three kinds of foo () #调用函数的语句形式def My_max (x, y): res=x if x >y else y return res# res=my_max *10000000 #调用函数的表达式形式 # Print (res) Res=my_max (My_max (10,20), 30) # The parameter print (res) of another function called in the function call
5) Parameters of the function
The parameters of a function are divided into two types: formal parameter (variable name), argument (value)
#定义阶段 def foo (x, y): #x =1,y=2 print (×) print (y) #调用阶段 foo
The parameters of the detailed distinguishing function are divided into five types:
Positional parameters, keyword parameters, default parameters, variable length parameters (*args,**kwargs), named keyword parameters
def foo (x, Y, z): #位置形参: Parameter that must be passed print (x, y, z) foo #位置实参数: corresponds to parameter one by one
Output:
1 2 3
def foo (x, Y, z): print (x, y, z) foo (z=3,x=1,y=2) #关键字参数需要注意的问题: # 1: The keyword argument must be followed by the positional argument # 2: Cannot repeat a value on a shape parameter
# foo (1,z=3,y=2) #正确 # foo (x=1,2,z=3) #错误 # foo (1,x=1,y=2,z=3)
# def register (name,age,sex= ' male '): #形参: Default Parameter # print (name,age,sex) # # Register (' ASB ', age=40) # Register (' A1SB ', 39 # register (' A2SB ', +) # Register (' A3SB ') # # Register (' Steel egg ', ' female ') # Register (' Steel egg ', sex= ' female ', age=19) # Default parameters to note: #一: The default parameter must be followed by a non-default parameter # def register (sex= ' male ', name,age): #在定义阶段就会报错 # print (name,age,sex) # Second: The default parameter is assigned at the definition stage, and is only assigned once in the definition phase # a=100000000# def foo (x,y=a): # print (x, y) # a=0# foo (1) #三: The value of the default parameter is usually defined as an immutable type
- Variable length parameters
*args * The overflow is received by the location-defined argument, and is assigned to args in the form of a tuple
def foo (X,y,*args): #* receives an overflow-defined argument by position and assigns it as a tuple to the args print (x, y) print (args) #
Foo (1,2,3,4,5)
For example:
# def Add (*args): # res=0# for i in args:# res+=i# return res# print (Add (1,2,3,4)) # Print (ADD)
Output:
103
**kwargs * * will receive an overflow of arguments defined by the keyword, assigned to Kwargs in the form of a dictionary
# def foo (x, Y, **kwargs): # * * will receive an overflow of arguments defined by the keyword, assigned to kwargs# print (x, y) # print (Kwargs) # foo (1, 2, in the form of a dictionary) A=1,name= ' Egon ', age=18)
For example:
def foo (name,age,**kwargs): print (name,age) if ' sex ' in Kwargs: print (kwargs[' sex ') " if ' height ' in Kwargs: print (kwargs[' height ')) foo (' Egon ', 18,sex= ' male ', height= ' 185 ') foo (' Egon ', 18,sex= ' male ')
Output:
Egon 18male185egon 18male
Foo (*[1,2,3]) #foo (all in all)
Foo (**{' x ': 1, ' B ': 2} #foo (x=1,b=2)
# def foo (name,age,*,sex= ' male ', height): # print (name,age) # print (Sex) # print (height) # #* After the parameters defined are named keyword parameters, such parameters, must be passed the value, and must be in the form of keyword Arguments # foo (' Egon ', 17,height= ' 185 ')
For example
def foo (x, Y, z): print (' from Foo ', x, Y, z) def wrapper (*args,**kwargs): print (args) #args = (All-in-all) print ( Kwargs) #kwargs ={' A ': 1, ' B ': 2} foo (*args,**kwargs) #foo (* (), **{' a ': 1, ' B ': 2}) #foo (1,2,3,b=2,a=1) # Wrapper ( 1,2,3,a=1,b=2) wrapper (1,z=2,y=3)
Output
(1,) {' Z ': 2, ' y ': 3}from foo 1 3 2
6) return value of function
Return statement [expression] exits the function, optionally returning an expression to the caller. A return statement without a parameter value returns none.
For example
def foo (): print (' from foo ') return Noneres=foo () print (res)
Output
From Foonone
Add:
The return value is none in three cases:
No Returnreturn, no writing, no return, none.
Return the result of a value function call is this value
def foo (): print (' from foo ') x=1 return Xres=foo () print (res)
Output:
From Foo1
Return value 1, value 2, value 3,... Return Result: (value 1, value 2, value 3,...)
def foo (): print (' from foo ') x=1 return 1,[2,3], (4,5), {}res=foo () print (res) #打印结果: (1,[2,3], (4,5), {}) A, B,c,d=foo () print (d)
Output:
from foo (1, [2, 3], (4, 5), {}) from foo{}
Python_ Character _ function