2018-06-27-python full stack Development day22-part2-xml module and RE module-Introduction to Regular expressions

Source: Internet
Author: User

1.xml Module

XML modules are commonly used file types before JSON appears,


<data> <country name="Liechtenstein"> <rank updated="Yes">2</rank> <year updated="Yes">2010</year> <gdppc>141100</gdppc> <neighbor direction="E"Name="Austria"/> <neighbor direction="W"Name="Switzerland"/> </country> <country name="Singapore"> <rank updated="Yes">5</rank> <year updated="Yes">2013</year> <gdppc>59900</gdppc> <neighbor direction="N"Name="Malaysia"/> </country> </data>

This is the type of XML, where <data> is a tree, and it is an architecture and ends with a

Root.get.root ()--Get root

First traverse the XML file


Import Xml.etree.ElementTree as Ettree=et.parse ('xml_lesson') Root= Tree.getroot ()

After print (Root.tag), the result is data


 for inch root: #遍历root, each value you get is Data     Print (I.tag) #输出数据的标签     Print (i.attrib) #输出每个标签的属性      for inch I:         print (J.tag) #继续输出每个标签        Print (J.attrib) #输出属性

property is used to describe the label.

1.1 Adding and deletion of XML files


 for  in Root.findall ('country'):# traverse the XML file, all tags are country    Rank=int (I.find ('rank'). Text)# Find the attribute rank    in country if rank>50:        root.remove (i)# Delete Tree.write If rank is greater than 50 (' Xmltree.xml ') # Write File

1.2 Creating a new XML file and writing data


New_xml=et. Element ('Yehiabin')#Create a new XML file with a root of YehaibinName=et. Subelement (New_xml,'name', attrib={'enrolled':'Yes'})#Create a name tag underneath the root and decorate it with attributesAge=et. Subelement (New_xml,' Age', attrib={'Hahah':'No'})#Add another labelSex=et. Subelement (New_xml,'Sex') Sex.text=' -'#. Text is a number in a labelet=et. ElementTree (New_xml)#Write File

2. Re module-Regular expression

Regular expressions, fuzzy matching of strings, a lot of application scenarios in finding data and in the Web

Brief introduction

2.1. Wildcard characters

The dot symbol, which can represent any symbol except for \, for fuzzy matching

2.2 ^$

Represents from the beginning with what to start, and at the end with what ends

2.3 About Duplicates

2.3.1 *

Repeat 0 or infinitely multiple times

2.3.2 +

Repeat 1 or infinitely multiple times


Repeat 0 or 1 times

2.3.4 {1,8}

Repeat 1-8 times

2.4 []

In brackets, the special symbols inside the brackets do not work, except-,^, \

where ^ in the brackets, the meaning of the right, the square brackets in the \, can make meaningful changes meaningless, meaningless to become meaningful


\d-Take any decimal number

\d-is the opposite of \d.

\s--take any whitespace character

\s---In contrast to \s

\w-take any letters and numbers

\w---In contrast to \w

And when using \w these symbols, consider that the re itself is a miniature language, and when we pass data such as \b to the Python interpreter, Python will translate it over and over again to the re. After the translation of the number has lost its original meaning, so to \\b, because the RE module only know \\b


Only one value can be taken from [] to match

The element within which it acts or acts.


A=re.findall ('X[xyu]','xy0')print( A)--['xy']

2.5 |

Pipe character, also or meaning

2.6 ()

grouping, combining several elements and matching them


A=re.findall ('(BC) +','askfjhbcbcjkhfdksa')  Print(a)

2.6.1 fixed collocation


A=re.search ('(? p<yehaibin>\d+)','546kfdgjldfk'). Group () Print (a)

2.7 FindAll ()

All matching matches are found and placed in the list

2.8 Search ()

Only the first lattice is found, and then the return value

2.9 Match ()


B=re.match ('\d+','dsjhfdi5456')print(b )

Find it from the beginning and return none if it doesn't start

2.10 Split ()


A=re.split ('ab','abc')Print (a)
[' ', ' C ']
First split the AB, corresponding to match ab left nothing, so get the empty string, and ' C ', then AB again with C to match, no match on, output C

2.11 Sub () Replace


A=re.sub ('\d','abc','fhsdk3fhsdk'  )print(a)--fhsdkabcfhsdk

A total of three parameters, the first is the way to match, the second is after the replacement of it, the third is to find the object, and then replace

2.12compile () compilation

Make the rules first, then you can reuse them.

Com=re.complie (' \d+)

Res=com.findall (' Sfhkfds2kjdsl)

This makes it possible to query multiple objects so that you do not have to repeat the write matching rules

2.13 Finditer ()

Put matching things into an iterator

2018-06-27-python full stack Development day22-part2-xml module and RE module-Introduction to Regular expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.