python10-Module 2

Source: Internet
Author: User
Tags tag name

First, JSON serialization

The process of turning an object (variable) from memory into a storage or transfer

   

Deserialization

Re-reading the variable contents from the serialized object into memory

Data can be serialized by Eval

Dic= ' {' name ': ' Alex '} '
F=open ("Hello", "W")
F.write (DIC)

F_read=open ("Hello", "R")
Data=f_read.read () #是个字符串
Print (Type (data))
Data=eval (data) #很多语言都支持, but there are limitations, but functions, classes it can't save
Print (data["name"])

Json

    What is JSON

If we are going to pass objects between different programming languages, we have to serialize the object into a standard format, such as XML, but the better way is to serialize it to JSON, because JSON represents a string that can be read by all languages, easily stored to disk, or transmitted over a network. JSON is not only a standard format, but also faster than XML, and can be read directly in the Web page, very convenient.

    Note 1: JSON cannot serialize classes, but serialization functions and classes do not make sense unless you pass through the same function on the other side.

    

    # # #通过json对数据进行序列化 # #

Import JSON

DIC = {' name ': ' Alex '}

f = open ("Test", "W")
data = Json.dumps (DIC)
F.write (data) # #json. Dump (DIC,F) combines dumps and f.write () for file operation only, so it is not recommended
F.close ()

JSON serialization: Turns the single quotation mark of a string into a double-citation, and turns the data (numbers, lists, dictionaries, tuples, and so on) into strings. The data above becomes a string, and other languages can use

# dic={' name ': ' Alex '} #---->{"name": "Alex"}-----> ' {' name ': ' Alex '} '
# i=8 #----> ' 8 '
# s= ' Hello ' #----> "Hello"------> ' "Hello"
# l=[11,22] #----> "[11,22]"

    ############################

    # # #通过json对数据进行反序列化 # #

Import JSON

f = open ("Test", "R")
data = Json.loads (F.read ()) # data=json.load (f) put open and loads two together
Print (data)
Print (Type (data))

F.close ()

This example also shows that there is not necessarily a dumps to loads out. As long as it conforms to the JSON specification, it can be interpreted, loads

    #############################

Second, pickle

The same as JSON, but it can only be used for data exchange within Python, not across languages, only dumps (dumps) and loads (load) two methods

    

    # # #通过pickle对数据进行序列化 # #

   Import Pickle

DIC = {' name ': ' Alvin ', ' age ': $, ' sex ': ' Male '}

f = open ("Ptest", "WB")
data = Pickle.dumps (DIC) #会变成字节类型, so use WB mode
Print (Type (data))
F.write (data)
F.close ()

    ##############################

    # # #通过pickle对数据进行反序列化 # #

  Import Pickle
f = open ("Ptest", "RB")
data = Pickle.loads (F.read ()) #会变成字节类型, so use WB mode
Print (data["name"])
Print (Type (data))
F.close ()

Third, XML

Protocols that implement data exchange between different languages or programs are similar to JSON, but are more complex than JSON. All data is implemented by tags, document tree structure. Java must learn, the financial field is indispensable.

    XML file structure, as in the following example:

<data>
<country name= "Liechtenstein" >
<rank updated= "Yes" >2</rank>
<year updated= "Yes" >2010</year>
<gdppc>141100</gdppc>
<neighbor direction= "E" name= "Austria"/>
<neighbor direction= "W" Name= "Switzerland"/>
</country>
<country name= "Singapore" >
<rank updated= "Yes" >5</rank>
<year updated= "Yes" >2013</year>
<gdppc>59900</gdppc>
<neighbor direction= "N" name= "Malaysia"/>
</country>
<country name= "Panama" >
<rank updated= "Yes" >69</rank>
<year updated= "Yes" >2013</year>
<gdppc>13600</gdppc>
<neighbor direction= "W" name= "Costa Rica"/>
<neighbor direction= "E" name= "Colombia"/>
</country>
</data>

    # # #xml模块 # #

Import Xml.etree.ElementTree as ET

Tree = Et.parse ("Xml_lesson") #拿到整个树结构
root = Tree.getroot () #拿到根节点
Print (Root.tag) #打印根节点的标签data

    # # #遍历XML # #
Import Xml.etree.ElementTree as ET

Tree = Et.parse ("Xml_lesson") #拿到整个树结构
root = Tree.getroot () #拿到根节点

For child in Root: #拿到根节点下的次节点
Print (Child.tag, child.attrib) #次节点的标签和属性
For I in the child: #拿到次节点下的子节点
Print (I.tag, i.text) #子节点的标签和文本内容

    # # #只遍历year节点 # #
For node in Root.iter (' year '): #如果你在country里拿, you can only take one, so you have to use the Iter method on the root node
Print (Node.tag, node.text)

    # # #修改 # #

For node in Root.iter (' year '):
new_year = Int (node.text) + 1 #修改年份
Node.text = str (new_year) #必须转换成字符串
Node.set ("Updated", "yes") #新增一个属性

Tree.write ("Xml_lesson") #文件名不一样就另存为一个文件

    # # #删除节点 # #

For country in Root.findall (' Country '): #这样就不用再用for循环遍历下一个country, FindAll is looking for multiple
rank = Int (country.find (' rank '). Text)
If rank > 50:
Root.remove (Country)

Tree.write (' Output.xml ')

    # # #生成一个XML # #

Import Xml.etree.ElementTree as ET

New_xml = ET. Element ("NameList") #创建一个根节点, i.e. <namelist>...</namelist>

Name = ET. Subelement (New_xml, "name", attrib={"enrolled": "Yes"}) #插入name标签, add a property
Age = ET. subelement (Name, "Age", attrib={"checked": "No"})
Sex = ET. subelement (name, "Sex")
Sex.text = ' 33 '
Name2 = ET. Subelement (New_xml, "name", attrib={"enrolled": "No"})
Age = ET. Subelement (name2, "age")
Age.text = ' 19 '

ET = et. ElementTree (new_xml) # Generate Document Object
Et.write ("Test.xml", encoding= "Utf-8", Xml_declaration=true)

Et.dump (new_xml) # Print generated format

Four, re module

A regular expression (or RE) is a small, highly specialized programming language that is embedded in Python and implemented through the RE module. The regular expression pattern is compiled into a sequence of bytecode, which is then executed by a matching engine written in C

The regular is mainly used to solve fuzzy matching, a real life example: from the identity card number to find "1970 born in Guangdong province."
372324199206034957 male Shandong Province * * *
623021198705154958 male Gansu Province * * *
429004194306184959 male Hubei Province
530111197101224956 male Yunnan Province Kunming * * *
440582199706174955 male Guangdong Province * * *
210711198901144954 male Liaoning Province Jinzhou * * *
450521198612214953 Male Guangxi * * *
321119195110314952 male Jiangsu Province * * *

Character matching (normal character and metacharacters characters)

1. Ordinary characters: Most characters and letters will match themselves

>>> Re.findall ("Ljy", "Dsafjaljydfjeiw")
[' Ljy ']

2. Meta-characters:. ^ $ * + ? { } [ ] | ( ) \

The metacharacters. ^ $ * + ? { }

>>> a = Re.findall ("L.. Y "," Dfjljuyaiq ") #点是通配符, each point represents a character that can match except \ n
>>> Print (a)
[' Ljuy ']

>>> B = Re.findall ("^l. Y "," Dfjljuyailabyq ") # ^ indicates what begins with a string, must be the beginning of the entire string to match, that is, l must appear in the first bit of the string
>>> print (b)
[]
>>> B = Re.findall ("^l. Y "," Ljuyailabyq ")
>>> print (b)
[' Ljuy ']

>>> ret = Re.findall ("a...j$", "Dsfkjsdakfyj") # $ indicates what the string ends with, so J must appear in the last digit of the string
>>> Print (ret)
[' Akfyj ']

>>> ret = Re.findall ("a...j$", "dsfkjsdakfyj$")
>>> Print (ret)
[]

>>> ret = Re.findall ("abc*", "Dsfkjabccccsdafyj") # * denotes [0,+oo] (greedy match), abc* match rule, c character can appear 0 or 1 times or infinite times, But it only gives you the highest number of returns.
>>> Print (ret)
[' ABCCCC ']

>>> ret = Re.findall ("abc+", "Dsfkjabccccsdafyj") # + ( [1,+oo] greedy match), abc* match rule, c character can appear 1 or 2 times or infinite times, But it only gives you the highest number of returns.
>>> Print (ret)
[' ABCCCC ']
>>> ret = Re.findall ("abc+", "Dsfkjabsdafyj") #但是c字符不能出现0次
>>> Print (ret)
[]

>>> ret = Re.findall ("ABC?", "Dsfkjabccccsdafyj") #? Expression [0,1] (greedy match), the C character can appear 0 or 1 times in the abc* matching rule, but it only returns you the highest number
>>> Print (ret)
[' ABC ']

>>> ret = Re.findall ("abc{1,3}", "Dsfkjabccccsdafyj") # {N,m} indicates that the preceding character appears n次到m次 (greedy match), abc{1,3} matches the rule, The C character can appear 1 times to 3 times, but it only returns you the highest number of times
>>> Print (ret)
[' ABCCC ']

>>> ret = Re.findall ("abc{2}", "Dsfkjabccccsdafyj")
>>> Print (ret)
[' ABCC ']

Note: The previous *,+,?, {} And so on are greedy matches, that is, match as much as possible, then add the number to the lazy match

>>> ret = Re.findall ("abc*?", "Dsfkjabccccsdafyj")
>>> Print (ret)
[' AB ']


Metacharacters's character set []:

>>> ret = Re.findall ("A[bc]d", "Dsfkjacdsdafyj") #[bc] means match to B or C or BC
>>> Print (ret)
[' ACD ']

>>> ret = Re.findall ("[A-Z]", "Dsfkjacdsdafyj") #匹配所有小写字母
>>> Print (ret)
[' d ', ' s ', ' f ', ' K ', ' j ', ' A ', ' C ', ' d ', ' s ', ' d ', ' a ', ' f ', ' Y ', ' j ']
          

>>> ret = Re.findall ("[. *+]", "a.*+") #. * + These three metacharacters are placed inside the [] parentheses to lose functionality, just plain characters
>>> Print (ret)
['. ', ' * ', ' + ']

Functional symbols in the character set: \ ^-

>>> ret = Re.findall ("[0-9]", "45bdha8") #匹配0到9之间的数字
>>> Print (ret)
[' 4 ', ' 5 ', ' 8 ']

>>> ret = Re.findall ("\d", "45bdha8") # matches decimal digits
>>> Print (ret)
[' 4 ', ' 5 ', ' 8 ']

>>> ret = Re.findall ("[^a]", "45BDHA8") #[^a] denotes inverse, does not match a
>>> Print (ret)
[' 4 ', ' 5 ', ' B ', ' d ', ' h ', ' 8 ']
>>> ret = Re.findall ("[^abcde]", "4e5bddhca8") # does not match the five characters of A, B, C, D, E
>>> Print (ret)
[' 4 ', ' 5 ', ' H ', ' 8 ']

Escape character of metacharacters \

Backslash followed by metacharacters: Remove special functions, such as \.

Backslash followed by normal character: implements special functions

      

\d matches any decimal number; it is equivalent to class [0-9].
\d matches any non-numeric character; it is equivalent to class [^0-9].
s matches any whitespace character; it is equivalent to class [\t\n\r\f\v].
\s matches any non-whitespace character; it is equivalent to class [^ \t\n\r\f\v].
\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].
\w matches any non-alphanumeric character; it is equivalent to a class [^a-za-z0-9_]
\b Matches a special character boundary, such as a space, &,#, etc.

      

>>> ret = Re.findall ("i\b", "I AM Noone")
>>> Print (ret)
[]

>>> ret = Re.findall ("i\\b", "I AM Noone")
>>> Print (ret)

[' I ']

>>> ret = Re.findall (r "i\b", "I AM Noone")
>>> Print (ret)
[' I ']

Doubts: \b In the regular in the original has special meaning, why also add R or add a \?


Because the above statement is executed in the Python interpreter. in Re, it can be directly matched with \b in its own language. But now the process is: Python interpreter
The Python interpreter translates it into the corresponding content when it is read to \b first. When Python gives the translated content to re, it is not the \b that re knows.
R is raw, and the rule preceded by R tells the Python interpreter not to do any translation, handing \b to re in the form of raw meat.
As for the \b in front of the reason, is to cancel the back of the special meaning, let it become a normal \,python will not translate it, so the ordinary \ and B passed in to re, and re know \b, so can match success.


>>> Re.findall ("c\\l", "ABC\LERWT")
[]
>>> Re.findall ("c\\\\l", "ABC\LERWT")
[' c\\l ']

      

Re layer: In order to match "c\l", with the "c\\l", the \ escaped, that is, the need for the Python interpreter "c\\l" to pass in
Python interpreter layer: To pass "c\\l" to Re,python needs to escape each slash, so it's "c\\\\l"
Why the return is "c\\l", because Python "c\\\\l" into re becomes "c\\l", and re inside "c\\l" result is "c\l", so match to return "c\\l" to Python
Python------->re-------> "Objects"
"C\\\\l" "c\\l" "c\l"
python<-------re<-------"Object"

Group of Metacharacters ()

>>> Re.findall ("(AB)", "ABDAFB")
[' AB ']
>>> Re.findall ("(AB) +", "ABDABFB")
[' AB ', ' AB ']

GROUP BY tag name

>>> res = Re.search (' (? P<ID>\D{2})/(? P<name>\w{3}) ', ' 23/com ')
>>> Print (RES)
<_sre. Sre_match Object at 0x7f79a06b4030>
>>> Res.group ()
' 23/com '
>>> res.group (' id ')
' 23 '
>>> res.group (' name ')
' com '

metacharacters |

>>> res = Re.findall (' (RAB) |8 ', ' RABHDG8SD ')
>>> Print (RES)
[' Rab ', ']
>>> res = Re.search (' (RAB) |8 ', ' RABHDG8SD ')
>>> Res.group ()
' Rab '
# () seems to be an inert match that matches to a return and no longer matches

    

Common methods for 3.re modules

>>> Re.findall (' A ', "Alvin Yuan")
[' A ', ' a '] #返回所有满足匹配条件的结果, put in the list

>>> re.search (' A ', "Alvin Yuan"). Group ()
' A ' #只会找第一个, shortest search, Re.search (). Group () Print matching results, useful when writing calculators

>>> Re.match (' A ', "Lavin Yuan")
>>> Re.match (' A ', "Alvin Yuan")
<_sre. Sre_match Object at 0x7f79a6dc1f38>
>>> Re.match (' A ', "Alvin Yuan"). Group () #
' A ' #当于在search的基础上加了一个 ^, that is, the first character must match on a

Re.split () with what to split Re.split ("[AB]", "ASDABCD")
>>> re.split ("[AB]", "ASDABCD")
[', ' SD ', ' ', ' CD ']

Process Analysis
First press a split, left is empty, right is SDABCD, get ["", SDABCD]
Because there is a, then press A to split, get ["", "SD", "BCD"]
Without a, split BCD by B, left blank, right CD, Get ["", "SD", "", "CD"]

     

Re.sub ("Match rule", "new", "old")
>>> ret = re.sub ("\d", "abc", "Alvin5yuan6")
>>> Print (ret)
Alvinabcyuanabc
>>> ret = re.sub ("\d", "abc", "Alvin5yuan6", 1) #最后一位限制替换的个数
>>> Print (ret)
Alvinabcyuan6

Re.subn ("Match rule", "new", "old") returns a tuple, including the replacement result and the number of replacements
>>> ret = re.subn ("\d", "abc", "Alvin5yuan6")
>>> Print (ret)
(' Alvinabcyuanabc ', 2)

>>> Rul = Re.compile ("\d+") #将写好的规则赋给一个变量
>>> Rul.findall ("324dafkjadsnf324jff")
[' 324 ', ' 324 ']

There is no difference from re.findall ("\d+", "324dafkjadsnf324jff"), unless the latter needs to be compiled multiple times, but
The former does not.

      

A = Re.finditer (' \d ', ' ds3sy4784a ') iterative matching, when the data is very large. Next (a) group () Take out results
>>> a = Re.finditer ("\d", "ds3sy4784a")
>>> Print (a)
>>> Next (a). Group ()
' 4 '
>>> Next (a). Group ()
' 7 '
>>> Next (a). Group ()
' 8 '
>>> Next (a). Group ()
' 4 '
>>> Next (a). Group ()
Traceback (most recent):
File "<stdin>", line 1, in <module>
Stopiteration

    

FindAll a point to note:

>>> res = Re.findall ("www." ( baidu|163). com "," www.baidu.com ")
>>> Print (RES)
[' Baidu ']

We expect the result is "www.baidu.com", but findall with the group "()", will give priority to the matching results in the group to return, you can remove the permissions to avoid

>>> res = Re.findall ("www. (?: baidu|163). com", "www.baidu.com")
>>> Print (RES)
[' www.baidu.com ']

python10-Module 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.