Python operations XML file details _python

Source: Internet
Author: User

There are a lot of reading XML articles about Python, but most of the articles are pasted with an XML file and then pasted with code to process the file. This is not conducive to beginners to learn, I hope this article can be more easily understood to teach how to use Python to read XML files.

One, what is XML?

XML expands the markup language, which can be used to mark data, define data types, and is a source language that allows users to define their own markup language.

Abc.xml

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<catalog>
<maxid>4</maxid>
<login username= "pytest" passwd= ' 123456 ' >
<caption>Python</caption>
<item id= "4" >
<caption> Testing </caption>
</item>
</login>
<item id= "2" >
<caption>Zope</caption>
</item>
</catalog>

Ok, structurally, it's much like our common HTML Hypertext Markup Language. But they are designed for different purposes, Hypertext Markup Language is designed to display data, its focus is the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.

Then it has the following characteristics:

First of all, it's got a label on the composition,<aa></aa>

Tags can have attributes: <aa id= ' 123 ' ></aa>

Tag pairs can embed data:<aa>abc</aa>

Tags can be embedded in child tags (with hierarchical relationships):

Second, obtain the label attribute

So, here's how to read this type of file in Python.

Copy Code code as follows:

#coding =utf-8
Import Xml.dom.minidom

#打开xml文档
Dom = Xml.dom.minidom.parse (' Abc.xml ')

#得到文档元素对象
root = Dom.documentelement
Print Root.nodename
Print Root.nodevalue
Print Root.nodetype
Print root. Element_node

Mxl.dom.minidom modules are used to process XML files, so they are introduced first.

Xml.dom.minidom.parse () is used to open an XML file and put this file object Dom variable.

DocumentElement is used to get the document elements of the DOM object and give the object to root

Each node has its Nodename,nodevalue,nodetype attribute.

NodeName is the name of the knot.

NodeValue is the value of a node and is valid only for text nodes.

NodeType is the type of node. Catalog is Element_node type

Here are a few of the following:

' Attribute_node '
' Cdata_section_node '
' Comment_node '
' Document_fragment_node '
' Document_node '
' Document_type_node '
' Element_node '
' Entity_node '
' Entity_reference_node '
' Notation_node '
' Processing_instruction_node '
' Text_node '


third, get the child label

Now you want to get the label name for catalog's child label

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<catalog>
&nbs p;      <maxid>4</maxid>
       <login Username= "Pytest" passwd= ' 123456 ' >
            < Caption>python</caption>
             <item id= "4" >
                     <caption> testing </caption>
             </item>
    </login>
    <item id= "2" & Gt
            <caption>Zope</caption>
    </item>
</catalog>

For child elements that know the name of an element, you can use the getElementsByTagName method to obtain:

Copy Code code as follows:

#coding =utf-8
Import Xml.dom.minidom

#打开xml文档
Dom = Xml.dom.minidom.parse (' Abc.xml ')

#得到文档元素对象
root = Dom.documentelement

bb = Root.getelementsbytagname (' Maxid ')
b= Bb[0]
Print B.nodename

bb = root.getelementsbytagname (' login ')
b= Bb[0]
Print B.nodename

How to differentiate labels with the same label name:

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<catalog>
&nbs p;      <maxid>4</maxid>
       <login Username= "Pytest" passwd= ' 123456 ' >
            < Caption>python</caption>
             <item id= "4" >
                     <caption> testing </caption>
             </item>
    </login>
    <item id= "2" & Gt
            <caption>Zope</caption>
    </item>
</catalog>

How to differentiate between <caption> and <item> labels more than one?

Copy Code code as follows:

#coding =utf-8
Import Xml.dom.minidom

#打开xml文档
Dom = Xml.dom.minidom.parse (' Abc.xml ')

#得到文档元素对象
root = Dom.documentelement

bb = root.getelementsbytagname (' caption ')
b= Bb[2]
Print B.nodename

bb = Root.getelementsbytagname (' item ')
b= Bb[1]
Print B.nodename

Root.getelementsbytagname (' caption ') obtains the label caption A set of labels, b[0] represents the first of a set of labels; b[2], which represents the third of the set of labels.

Four, get the label attribute value

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<catalog>
&nbs p;      <maxid>4</maxid>
       <login Username= "Pytest" passwd= ' 123456 ' >
            < Caption>python</caption>
             <item id= "4" >
                     <caption> testing </caption>
             </item>
    </login>
    <item id= "2" & Gt
            <caption>Zope</caption>
    </item>
</catalog>

<login> and <item> tags are attributes, how do they get their properties?

Copy Code code as follows:

#coding =utf-8
Import Xml.dom.minidom

#打开xml文档
Dom = Xml.dom.minidom.parse (' Abc.xml ')

#得到文档元素对象
root = Dom.documentelement

itemlist = root.getelementsbytagname (' login ')
item = itemlist[0]
Un=item.getattribute ("username")
Print UN
Pd=item.getattribute ("passwd")
Print PD

II = root.getelementsbytagname (' item ')
I1 = ii[0]
I=i1.getattribute ("id")
Print I

I2 = ii[1]
I=i2.getattribute ("id")
Print I

The GetAttribute method can get the value of the attribute of the element.

V. Obtain data between the label pairs

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<catalog>
&nbs p;      <maxid>4</maxid>
       <login Username= "Pytest" passwd= ' 123456 ' >
            < Caption>python</caption>
             <item id= "4" >
                     <caption> testing </caption>
             </item>
    </login>
    <item id= "2" & Gt
            <caption>Zope</caption>
    </item>
</catalog>

<caption> There is data between the label pairs, how to obtain this data?

There are several ways to get the data between the label pairs,

Method One:

Copy Code code as follows:

#coding =utf-8
Import Xml.dom.minidom

#打开xml文档
Dom = Xml.dom.minidom.parse (' Abc.xml ')

#得到文档元素对象
root = Dom.documentelement

Cc=dom.getelementsbytagname (' caption ')
C1=CC[0]
Print C1.firstChild.data

C2=CC[1]
Print C2.firstChild.data

C3=CC[2]
Print C3.firstChild.data

The FirstChild property returns the first child node of the selected node,. Data indicates that the node's data is obtained.

Method Two:

Copy Code code as follows:

#coding =utf-8
From Xml.etree import ElementTree as ET
Per=et.parse (' Abc.xml ')
P=per.findall ('./login/item ')

For Oneper in P:
For child in Oneper.getchildren ():
Print Child.tag, ': ', Child.text


P=per.findall ('./item ')

For Oneper in P:
For child in Oneper.getchildren ():
Print Child.tag, ': ', Child.text

Method Two is a bit complicated, and the referenced module is not the same as the previous one, FindAll is used to specify which level of tabs to begin traversing.

The GetChildren method returns all child labels in document order. and output the label name (Child.tag) and label data (Child.text)

In fact, method two is not the role of this, its core function is to traverse a certain level of the label under all the child tags.

PS: Here again for you to provide several online tools on XML operations for your reference to use:

Online Xml/json Mutual Conversion tool:
Http://tools.jb51.net/code/xmljson

Online format xml/on-line compression of XML:
Http://tools.jb51.net/code/xmlformat

XML online compression/formatting tool:
http://tools.jb51.net/code/xml_format_compress

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.