Python operations on xml files

Last Update:2014-06-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many articles about reading xml from python, but most of them post an xml file and then post the code for processing the file. This is not conducive to learning for beginners. I hope this article will be easier to understand and teach you how to use python to read xml files.

1. What is xml?

Xml can be used to tag data and define data types. It is a source language that allows you to define your own markup language.

Abc. xml
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

OK. In terms of structure, it is similar to our common HTML hypertext markup language. However, they are designed for different purposes. hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, with the focus on data content.

It has the following features:

First, it is composed of tag pairs, <aa> </aa>

The tag can have attributes: <aa id = '000000'> </aa>

Tag pairs can embed data: <aa> abc </aa>

Tags can be embedded into sub-tags (with hierarchical relationships ):

Ii. Obtain tag attributes

The following describes how to use python to read files of this type.
Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement
Print root. nodeName
Print root. nodeValue
Print root. nodeType
Print root. ELEMENT_NODE

The mxl. dom. minidom module is used to process xml files.

Xml. dom. minidom. parse () is used to open an xml file and change the dom variable of the file object.

DocumentElement is used to obtain the document element of the dom object and give the obtained object to the root user.

Each node has its nodeName, nodeValue, and nodeType attributes.

NodeName is the node name.

NodeValue is the value of a node and is only valid for text nodes.

NodeType is the node type. Catalog is of the ELEMENT_NODE type.

There are currently the following types:

'Attribute _ node'
'Cdata _ SECTION_NODE'
'Comment _ node'
'Document _ FRAGMENT_NODE'
'Document _ node'
'Document _ TYPE_NODE'
'Element _ node'
'Entity _ node'
'Entity _ REFERENCE_NODE'
'Notation _ node'
'Processing _ INSTRUCTION_NODE'
'Text _ node'

3. Obtain sub-tags

Now you need to obtain the name of the sub-tag of catalog
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

You can use the getElementsByTagName method to obtain the child element that knows the element name:
Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Bb = root. getElementsByTagName ('maxid ')
B = bb [0]
Print B. nodeName

Bb = root. getElementsByTagName ('login ')
B = bb [0]
Print B. nodeName

How to differentiate tags with the same Tag Name:
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

How to distinguish between <caption> and <item> labels?

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Bb = root. getElementsByTagName ('caption ')
B = bb [2]
Print B. nodeName

Bb = root. getElementsByTagName ('item ')
B = bb [1]
Print B. nodeName

Root. getElementsByTagName ('caption ') obtains a group of caption tags. B [0] indicates the first tag in A group. B [2], the third tag in the group.

4. Obtain tag attribute values

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

<Login> and <item> tags have attributes. How can they be obtained?

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Itemlist = root. getElementsByTagName ('login ')
Item = itemlist [0]
Un = item. getAttribute ("username ")
Print un
Pd = item. getAttribute ("passwd ")
Print pd

Ii = root. getElementsByTagName ('item ')
I1 = ii [0]
I = i1.getAttribute ("id ")
Print I

I2 = ii [1]
I = i2.getAttribute ("id ")
Print I

The getAttribute method can obtain the values corresponding to the attributes of an element.

5. obtain data between tag pairs

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

<Caption> there is data between tag pairs. How can we obtain the data?

There are multiple methods to obtain data between tag pairs,

Method 1:

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Cc = dom. getElementsByTagName ('caption ')
C1 = cc [0]
Print c1.firstChild. data

C2 = cc [1]
Print c2.firstChild. data

C3 = cc [2]
Print c3.firstChild. data

The firstChild attribute returns the first child node of the selected node.. data indicates that the node's person data is obtained.

Method 2:

Copy codeThe Code is as follows:
# Coding = UTF-8
From xml. etree import ElementTree as ET
Per = ET. parse ('abc. xml ')
P = per. findall ('./login/item ')

For oneper in p:
For child in oneper. getchildren ():
Print child. tag, ':', child. text

P = per. findall ('./item ')

For oneper in p:
For child in oneper. getchildren ():
Print child. tag, ':', child. text

Method 2 is a bit complicated, and the referenced module is different from the previous one. findall is used to specify which level of tag to start traversing.

The getchildren method returns all child tags in the document order. And output the tag Name (child. tag) and tag data (child. text)

Actually, method 2 does not work here. Its core function is to traverse all sub-tags under a certain level of tag.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python operations on xml files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python operations on xml files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support