First, the understanding module
What is a module: A module is a file containing Python definitions and declarations, with the suffix of. py appended to the file name, but in fact the import loaded module is divided into four general categories:
1. Code written using Python (. py file)
2. C or C + + extensions that have been compiled as shared Library II and DLLs
3. Package a set of modules
4. Built-in modules written and connected to the Python interpreter using C
Why Use MO modules?
If you want to exit the Python interpreter and then re-enter it, then the functions or variables you defined previously will be lost, so we usually write the program to a file so that it can be persisted and executed in Python test.py when needed, and test.py is called a scripting script.
With the development of the program, more and more functions, in order to facilitate management, we usually divide the file into a file, so that the structure of the program is clearer and easier to manage. At this point, we can not only do these files as scripts to execute, but also as modules to import into other modules, the implementation of the function of reuse.
Second, the common module classification
Common Modules One,
Collocations module
Time Module
Random module
OS Module
SYS module
Serialization module
Re module
Common Module Two: These modules and object-oriented
Hashlib Module
Configparse Module
Logging module
Third, regular expression
Like we usually see the registration page What, we need to enter the mobile phone number, you think our phone number is also limited (mobile phone number of 11, and only the number of 13,14,15,17,18 beginning of these characteristics) if your input error will prompt, So if you want to implement this program, you think it's easy to use the while loop, so let's look at the results of the implementation.
#判断手机号码是否合法while True: phone_number=input (' Please enter your phone number: ') if Len (phone_number) ==11 and Phone_number.isdigit ( ) and (Phone_number.startswith (') or phone_number.startswith (' + ') or Phone_number.startswith (' 15 ' or Phone_number.startswith (' + ') or Phone_number.startswith (' P '): print (' is a legitimate mobile number ') else: print (' Not a valid mobile number ')
Seeing this code, though understandable, is easy, but I have a much simpler approach. Let's take a look at it.
Import rephone_number=input (' Please enter your phone number: ') if Re.match (' ^ (13|14|15|17|18) [0-9]{9}$ ', phone_number): ' ^ This symbol indicates whether the judgment starts with 13|14|15|17|18, [0-9]: [] represents a character group, can represent 0-9 of any character {9}: Indicates that the following number repeats nine times $: Indicates the Terminator ' Print (' is a legitimate mobile number ') Else: print (' Not a valid mobile number ')
So what is a regular?
The first thing you need to know is that when it comes to the regular, it's only related to strings. Online test Tool http://tool.chinaz.com/regex/
For example, you want to use ' 1 ' to match ' 1 ', or ' 2 ' to match ' 2 ', you can match it directly.
Character group: [Character Group] the various characters that may appear in the same position make up a group of characters, and in regular expressions the characters are divided into many classes, such as numbers, letters, punctuation, and so on. If you now ask for a position of ' only one number ', then the character in this position can only be 0, 1, 2, 3.......9 This is one of the numbers.
Character groups:
Character:
Quantifiers:
.^$
*+? {}
Note: The front of the *,+,? And so are greedy matches, that is, as many matches as possible, after the addition? It becomes a non-greedy match, that is, an inert match.
Greedy match:
A few common matching greedy matches
*
?;重复任意次,但尽可能少重复
+
?:重复一次或更多次,但尽可能少重复
??:重复
0
次或
1
次,但尽可能少重复
{n,m}:重复n到m次,但尽可能少重复
{n,}: 重复n次以上,但尽可能少重复
. *? usage:
. is any character * to take 0 to infinity length? Non-greedy mode and together is to take as few as possible any character, generally not so alone, mostly used in:. *?x means to take the characters at any length of the front until an X appears
Character:
Group () and OR | [^]:
(
1
)^[
1
-
9
]\d{
13
,
16
}[
0
-
9x
]$
#^以数字0-9开始,
\d{
13
,
16
}重复
13
次到
16
次
$结束标志
上面的表达式可以匹配一个正确的身份证号码
(
2
)^[
1
-
9
]\d{
14
}(\d{
2
}[
0
-
9x
])?$
#?重复0次或者1次,当是0次的时候是15位,是1的时候是18位
(
3
)^([
1
-
9
]\d{
16
}[
0
-
9x
]|[
1
-
9
]\d{
14
})$
#表示先匹配[1-9]\d{16}[0-9x]如果没有匹配上就匹配[1-9]\d{14}
#对于分组的理解举个例子, such as the HTML source has <title>xxx</title> tags, with the previous knowledge, we can only determine the source of <title> and </title> is fixed. Therefore, if you want to get the page title (XXX), at best, you can only write an expression similar to this: <TITLE>.*</TITLE> and so the write match is complete <title>xxx</title> Label, and not simply the page title xxx. To solve the above problems, we need to use the assertion knowledge. Before asserting, the reader should understand the grouping, which helps to understand the assertion. Group in the regular use (), according to the understanding of the side dish, the role of the group has two: N to see some laws as a group, and then group-level duplication, you can get unexpected results. After the n grouping, you can simplify the expression by using a back-reference. First of all, for the IP address matching, the simple can be written as follows: \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} But careful observation, we can find a certain pattern, we can see. \d{1,3} As a whole, that is, to see them as a group , then repeat this group 3 times. The expression is as follows: \d{1,3} (. \d{1,3}) {3} Look, it's more concise. To see the second function, take the match <title>xxx</title> tag, the simple regex can be written like this: <title>.*</title> can see that there are two title in the top expression, Exactly the same, can actually be shortened by grouping. The expression is as follows: < (title) >.*</\1> This example is actually the actual application of the reverse reference. For grouping, the entire expression is always counted as group No. 0, in this case, the No. 0 Group is < (title) >.*</\1>, and then from left to right, followed by grouping numbering, so (title) is the 1th group. With \1 this syntax, you can refer to a group of text content, \1 of course refers to the 1th group of text content, so that you can simplify the regular expression, write only once title, put it in the group, and then in the back of the reference. Inspired by this, we don'tCan you simplify the just-in-IP-address regular expression? The original expression is \d{1,3} (. \d{1,3}) {3}, the inside of the \d{1,3} repeated two times, if the use of back-to-reference simplification, the expression is as follows: (\d{1,3}) (. \1) {3} simple explanation, put the \d{1,3} in a group, expressed as (\d{1, 3}), it is the 1th group, (. \1) is the 2nd group, in the 2nd group through the \1 syntax, followed by reference to the 1th group of text content. After the actual test, you will find that it is wrong to write, why? Side dishes have always been emphasized, back-to-reference, referring to only textual content, not regular expressions! That is, once the contents of a group are successfully matched, a reference is made to the content after the successful match, referring to the result, not the expression. Therefore, (\d{1,3}) (. \1) {3} This expression actually matches four numbers with the same IP address, for example: 123.123.123.123.
Group naming: Syntax (? p<name>) Note First name, then regular
Import reimport Reret=re.search (' < (\w+) >\w+< (/\w+) > ', '
Escape character:
Four, re module
Re module-related methods
# 1.re Module Common Method # 1.findall method Import Reret = Re.findall (' A ', ' Eva Ang Egons ') # #返回所有满足匹配条件的结果, put in the list print (ret) # 2.search Method # Letter The number will find the pattern match in the string, only find the first match and then return # an object that contains matching information, which gets the matching # string by calling the group () method, and if the string does not match, then the error ret = Re.search (' s ', ' Eva Ang egons ' ) #找第一个print (Ret.group ()) # 3.match method Print (Re.match (' A ', ' ABC '). Group ()) #同search, match only from the string, and Guoup to find # 4. The Split method print (Re.split (' [ab] ', ' ABCD ')) #先按 ' A ' split get ' and ' BCD ', in pairs ' and ' BCD ' respectively by ' B ' Split # 5.sub method Print (re.sub (' \d ', ' H ', ' Eva3sdf4ahi4asd45 ', 1) # Replace the number with ' H ', parameter 1 means only one # 6.subn method print (re.subn (' \d ', ' h ', ' eva3sdf4ahi4asd45 ')) #将数字替换成 ' H ', Returns the tuple (replaced by the result, how many times replaced) # 7.compile Method obj = Re.compile (' \d{3} ') #将正则表达式编译成一个正则表达式对象, the rule to match is three digital print (obj) ret = Obj.search (' Abc12345eeeee ') #正则表达式对象调用search, the parameter is the string to be matched print (Ret.group ()) #.group a little bit to show the result # 8.finditer method ret = re.finditer (' \d ', ' DSF546SFSC ') #finditer返回的是一个存放匹配结果的迭代器 # print (ret) #<callable_iterator object at 0x00000000021e9e80>print ( Next (Ret) group ()) #查看第一个结果print (Next (ret). Group ()) #查看第二个结果print ([I.group () for I in RET]) #查看剩余的左右结果
Priority queries for FindAll
Import Reret = Re.findall (' www. ( baidu|oldboy). com ', ' www.oldboy.com ') print (ret) #结果是 [' Oldboy '] This is because FindAll will prioritize the contents of the matching results group, and if you want to match the result, cancel the permission can be ret = Re.findall (' www. (?: baidu|oldboy). com ', ' www.oldboy.com ') print (ret) #[' www.oldboy.com ']
Split's priority query
ret = re.split (' \d+ ', ' EVA123DASDA9DG ') #按数字分割开了print (ret) #输出结果: [' Eva ', ' Dasda ', ' dg ']ret = Re.split (' (\d+) ', ' EVA123DASDA9DG ') print (ret) #输出结果: [' Eva ', ' 123 ', ' DASDA ', ' 9 ', ' DG ']# # after the matching section plus () and no parentheses to cut out the result is different, # No parentheses do not retain the matching item, But the parentheses can keep the # match, which is very important in some of the processes that need to keep the matching part.
Five, re modules and the relationship between regular expressions
The re module and regular expression do not have a bit of yarn relationship. The relationship between the RE module and the regular expression is similar to the time module and the temporal relationship, you did not learn Python before, and did not know that there is a time module, but you have known the times, 12:30 means 12:30 noon. Time has its own format, month and day, and seconds, has become a rule. You've always had your heart in mind, and the time module is just a tool that Python gives us to make it easy for us to operate.
Vi.. Collections Module
Based on the built-in data type (dict,list,set,tuple), the collections module also provides several additional data types:
1.namedtuple: Generate a tuple that can access the content of an element by using a name
2.deque: Two-way queue (can be out at both ends, but cannot take the middle value), can quickly append and eject objects from the other side
3.Counter: Counter, mainly used to count
4.OrderedDict: Ordered Dictionary
5.defaultdict: Dictionary with default values
Namedtuple:
We know that a tuple can represent an invariant set, for example, a two-dimensional coordinate of a point can be represented as: p=
However, it is hard to see that this tuple is used to represent coordinates.
Well, our namedtuple can be used.
Namedtuple (' name ', ' Property list ')
From collections Import Namedtuplepoint = Namedtuple ("point", [' X ', ' y ']) p = point print (P.X,P.Y), <br> <br><br>
Circle = namedtuple (' Circle ', [' X ', ' y ', ' r ']) #用坐标和半径表示一个圆
Deque
单向队列<br>
# import queue #队列模块
# q = queue.Queue()
# q.put(10)
# q.put(20)
# q.put(30)
# # 10 20 30
# print(q.get())
# print(q.get())
# print(q.get())
# print(q.get())
Deque is a two-way queue for efficient insert and delete operations, for queues and stacks
From collections Import Dequeq = Deque ([' A ', ' B ', ' C ']) q.append (' ee ') #添加元素q. Append (' ff ') q.append (' QQ ') print (q) Q.appendleft (' www ') #从左边添加print (q) Q.pop () #删除元素q. Popleft () #从左边删除元素print (q)
Ordereddict
When using a dictionary, key is unordered. We cannot determine the order of keys when we iterate over the dictionary. If you want to keep the key in order, you can use Ordereddict
From collections Import Ordereddict
D = {' Z ': ' qww ', ' x ': ' ASD ', ' y ': ' asd ', ' name ': ' Alex '}
Print (D.keys ()) #key是无序的
od
=
OrderedDict([(
‘a‘
,
1
), (
‘b‘
,
2
), (
‘c‘
,
3
)])
print
(od)
# OrderedDict的Key是有序的 <br>OrderedDict([(‘a‘, 1), (‘b‘, 2), (‘c‘, 3)])<br><br><br>
Note OrderedDict
The keys are sorted in the order in which they are inserted, not the key itself: od = odereddict ()
Od[' Z ']=1
od[' y ']=2
od[' x ']=3
Print (Od.keys ()) #按照插入额key的顺序返回
Defaultdict
#找大于66和小于66的d = {' Z ': ' qww ', ' x ': ' ASD ', ' y ': ' asd ', ' name ': ' Alex '}print (D.keys ()) from collections Import Defaultdictvalues = [11,22,33,44,55,66,77,88,99]my_dict = defaultdict (list) for V in values: if v>66: my_ dict[' K1 '].append (v) else: my_dict[' K2 '].append (v) print (my_dict)
From collections Import DEFAULTDICTDD = Defaultdict (lambda: ' N/a ') dd[' key1 '] = ' abc ' Print (dd[' Key1 ']) # Key1 presence Print (dd[' Key2 ']) # Key2 does not exist, returns the default value
Counter
The purpose of the counter class is to track the number of occurrences of a value. It is an unordered container type, stored in the form of a dictionary key-value pair, where the element is counted as the key and its count as value. The count value can be any Interger (including 0 and negative numbers). The counter class is similar to bags or multisets in other languages.
From collections Import Counterc = Counter (' Abcdeabcdabcaba ') print (c) # output: Counter ({' A ': 5, ' B ': 4, ' C ': 3, ' d ': 2, ' E ': 1} )
Other details http://www.cnblogs.com/Eva-J/articles/7291842.html
Python full stack development "Nineth" Python common module one (mainly re regular and collections)