1.xml Module
XML modules are commonly used file types before JSON appears,
<data> <country name="Liechtenstein"> <rank updated="Yes">2</rank> <year updated="Yes">2010</year> <gdppc>141100</gdppc> <neighbor direction="E"Name="Austria"/> <neighbor direction="W"Name="Switzerland"/> </country> <country name="Singapore"> <rank updated="Yes">5</rank> <year updated="Yes">2013</year> <gdppc>59900</gdppc> <neighbor direction="N"Name="Malaysia"/> </country> </data>
This is the type of XML, where <data> is a tree, and it is an architecture and ends with a
Root.get.root ()--Get root
First traverse the XML file
Import Xml.etree.ElementTree as Ettree=et.parse ('xml_lesson') Root= Tree.getroot ()
After print (Root.tag), the result is data
for inch root: #遍历root, each value you get is Data Print (I.tag) #输出数据的标签 Print (i.attrib) #输出每个标签的属性 for inch I: print (J.tag) #继续输出每个标签 Print (J.attrib) #输出属性
property is used to describe the label.
1.1 Adding and deletion of XML files
for in Root.findall ('country'):# traverse the XML file, all tags are country Rank=int (I.find ('rank'). Text)# Find the attribute rank in country if rank>50: root.remove (i)# Delete Tree.write If rank is greater than 50 (' Xmltree.xml ') # Write File
1.2 Creating a new XML file and writing data
New_xml=et. Element ('Yehiabin')#Create a new XML file with a root of YehaibinName=et. Subelement (New_xml,'name', attrib={'enrolled':'Yes'})#Create a name tag underneath the root and decorate it with attributesAge=et. Subelement (New_xml,' Age', attrib={'Hahah':'No'})#Add another labelSex=et. Subelement (New_xml,'Sex') Sex.text=' -'#. Text is a number in a labelet=et. ElementTree (New_xml)#Write File
2. Re module-Regular expression
Regular expressions, fuzzy matching of strings, a lot of application scenarios in finding data and in the Web
Brief introduction
2.1. Wildcard characters
The dot symbol, which can represent any symbol except for \, for fuzzy matching
2.2 ^$
Represents from the beginning with what to start, and at the end with what ends
2.3 About Duplicates
2.3.1 *
Repeat 0 or infinitely multiple times
2.3.2 +
Repeat 1 or infinitely multiple times
2.3.3?
Repeat 0 or 1 times
2.3.4 {1,8}
Repeat 1-8 times
2.4 []
In brackets, the special symbols inside the brackets do not work, except-,^, \
where ^ in the brackets, the meaning of the right, the square brackets in the \, can make meaningful changes meaningless, meaningless to become meaningful
which
\d-Take any decimal number
\d-is the opposite of \d.
\s--take any whitespace character
\s---In contrast to \s
\w-take any letters and numbers
\w---In contrast to \w
And when using \w these symbols, consider that the re itself is a miniature language, and when we pass data such as \b to the Python interpreter, Python will translate it over and over again to the re. After the translation of the number has lost its original meaning, so to \\b, because the RE module only know \\b
Only one value can be taken from [] to match
The element within which it acts or acts.
A=re.findall ('X[xyu]','xy0')print( A)--['xy']
2.5 |
Pipe character, also or meaning
2.6 ()
grouping, combining several elements and matching them
A=re.findall ('(BC) +','askfjhbcbcjkhfdksa') Print(a)
2.6.1 fixed collocation
A=re.search ('(? p<yehaibin>\d+)','546kfdgjldfk'). Group () Print (a)
2.7 FindAll ()
All matching matches are found and placed in the list
2.8 Search ()
Only the first lattice is found, and then the return value
2.9 Match ()
B=re.match ('\d+','dsjhfdi5456')print(b )
Find it from the beginning and return none if it doesn't start
2.10 Split ()
A=re.split ('ab','abc')Print (a)
--
[' ', ' C ']
First split the AB, corresponding to match ab left nothing, so get the empty string, and ' C ', then AB again with C to match, no match on, output C
2.11 Sub () Replace
A=re.sub ('\d','abc','fhsdk3fhsdk' )print(a)--fhsdkabcfhsdk
A total of three parameters, the first is the way to match, the second is after the replacement of it, the third is to find the object, and then replace
2.12compile () compilation
Make the rules first, then you can reuse them.
Com=re.complie (' \d+)
Res=com.findall (' Sfhkfds2kjdsl)
This makes it possible to query multiple objects so that you do not have to repeat the write matching rules
2.13 Finditer ()
Put matching things into an iterator
2018-06-27-python full stack Development day22-part2-xml module and RE module-Introduction to Regular expressions