Python operations on xml files

Source: Internet
Author: User

There are many articles about reading xml from python, but most of them post an xml file and then post the code for processing the file. This is not conducive to learning for beginners. I hope this article will be easier to understand and teach you how to use python to read xml files.

1. What is xml?

Xml can be used to tag data and define data types. It is a source language that allows you to define your own markup language.

Abc. xml
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

OK. In terms of structure, it is similar to our common HTML hypertext markup language. However, they are designed for different purposes. hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, with the focus on data content.

It has the following features:

First, it is composed of tag pairs, <aa> </aa>

The tag can have attributes: <aa id = '000000'> </aa>

Tag pairs can embed data: <aa> abc </aa>

Tags can be embedded into sub-tags (with hierarchical relationships ):

Ii. Obtain tag attributes

The following describes how to use python to read files of this type.
Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement
Print root. nodeName
Print root. nodeValue
Print root. nodeType
Print root. ELEMENT_NODE

The mxl. dom. minidom module is used to process xml files.

Xml. dom. minidom. parse () is used to open an xml file and change the dom variable of the file object.

DocumentElement is used to obtain the document element of the dom object and give the obtained object to the root user.

Each node has its nodeName, nodeValue, and nodeType attributes.

NodeName is the node name.

NodeValue is the value of a node and is only valid for text nodes.

NodeType is the node type. Catalog is of the ELEMENT_NODE type.

There are currently the following types:

'Attribute _ node'
'Cdata _ SECTION_NODE'
'Comment _ node'
'Document _ FRAGMENT_NODE'
'Document _ node'
'Document _ TYPE_NODE'
'Element _ node'
'Entity _ node'
'Entity _ REFERENCE_NODE'
'Notation _ node'
'Processing _ INSTRUCTION_NODE'
'Text _ node'


3. Obtain sub-tags

Now you need to obtain the name of the sub-tag of catalog
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

You can use the getElementsByTagName method to obtain the child element that knows the element name:
Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Bb = root. getElementsByTagName ('maxid ')
B = bb [0]
Print B. nodeName

Bb = root. getElementsByTagName ('login ')
B = bb [0]
Print B. nodeName

How to differentiate tags with the same Tag Name:
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

How to distinguish between <caption> and <item> labels?

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Bb = root. getElementsByTagName ('caption ')
B = bb [2]
Print B. nodeName

Bb = root. getElementsByTagName ('item ')
B = bb [1]
Print B. nodeName

Root. getElementsByTagName ('caption ') obtains a group of caption tags. B [0] indicates the first tag in A group. B [2], the third tag in the group.

4. Obtain tag attribute values

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

<Login> and <item> tags have attributes. How can they be obtained?

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Itemlist = root. getElementsByTagName ('login ')
Item = itemlist [0]
Un = item. getAttribute ("username ")
Print un
Pd = item. getAttribute ("passwd ")
Print pd

Ii = root. getElementsByTagName ('item ')
I1 = ii [0]
I = i1.getAttribute ("id ")
Print I

I2 = ii [1]
I = i2.getAttribute ("id ")
Print I

The getAttribute method can obtain the values corresponding to the attributes of an element.

5. obtain data between tag pairs

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Catalog>
<Maxid> 4 </maxid>
<Login username = "pytest" passwd = '000000'>
<Caption> Python </caption>
<Item id = "4">
<Caption> test </caption>
</Item>
</Login>
<Item id = "2">
<Caption> Zope </caption>
</Item>
</Catalog>

<Caption> there is data between tag pairs. How can we obtain the data?

There are multiple methods to obtain data between tag pairs,

Method 1:

Copy codeThe Code is as follows:
# Coding = UTF-8
Import xml. dom. minidom

# Open an xml document
Dom = xml. dom. minidom. parse ('abc. xml ')

# Obtain the document Element Object
Root = dom.doc umentElement

Cc = dom. getElementsByTagName ('caption ')
C1 = cc [0]
Print c1.firstChild. data

C2 = cc [1]
Print c2.firstChild. data

C3 = cc [2]
Print c3.firstChild. data

The firstChild attribute returns the first child node of the selected node.. data indicates that the node's person data is obtained.

Method 2:

Copy codeThe Code is as follows:
# Coding = UTF-8
From xml. etree import ElementTree as ET
Per = ET. parse ('abc. xml ')
P = per. findall ('./login/item ')

For oneper in p:
For child in oneper. getchildren ():
Print child. tag, ':', child. text


P = per. findall ('./item ')

For oneper in p:
For child in oneper. getchildren ():
Print child. tag, ':', child. text

Method 2 is a bit complicated, and the referenced module is different from the previous one. findall is used to specify which level of tag to start traversing.

The getchildren method returns all child tags in the document order. And output the tag Name (child. tag) and tag data (child. text)

Actually, method 2 does not work here. Its core function is to traverse all sub-tags under a certain level of tag.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.