Python file Processing: parsing. xml file _

Python file Processing: parsing. xml file __python

Last Update:2018-07-30 Source: Internet

Author: User

Tags truncated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

XML (extensible Markup Language) refers to Extensible Markup Language, which is designed to transmit and store data, has become the core of many new technologies at present and has different applications in different fields. It is the inevitable product of web development to a certain stage, it has both the core features of SGML and the simplicity of HTML, and it has many new features, such as clear and good structure.

When the target detection is to manually mark the dataset, The target location information for the annotation is usually written to an. xml file, but in the examination of data and data cleaning, the face of a bunch of coordinate data is difficult to check the accuracy of the annotation, so it is often necessary to parse the XML file, read out the preservation of the goals and objectives of some information.
The XML file that needs to be parsed is as follows

<annotation> <folder>Images</folder> <filename>00001</filename> <path>e:\ images\00001.jpg</path> <source> <database>Unknown</database> </source> &LT;SIZE&G
    T <width>432</width>  
Ok, structurally, it's much like our common HTML Hypertext Markup Language. But they are designed for different purposes, Hypertext Markup Language is designed to display data, its focus is the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.

Then it has the following characteristics:

First, it is composed of a label pair:
<aa></aa>
Tags can have attributes: <aa id= ' 123 ' ></aa>
Tag pairs can embed data:<aa>abc</aa>
Tags can be embedded in child tags (with hierarchical relationships):

<aa>
     <bb></bb>
</aa> 
The XML file above holds the target's name: name and four coordinates (xmin, ymin, Xmax, ymax), and uses Python to parse it.

#coding =utf-8
Import  xml.dom.minidom

#打开xml文档
dom = xml.dom.minidom.parse (' abc.xml ')

# Get document Element Object
root = dom.documentelement
print root.nodename
print root.nodevalue print
root.nodetype
print root. Element_node 
Mxl.dom.minidom modules are used to process XML files, so they are introduced first.

Xml.dom.minidom.parse () is used to open an XML file and put this file object Dom variable.

DocumentElement is used to get the document elements of the DOM object and give the object to root

Each node has its Nodename,nodevalue,nodetype attribute.

NodeName is the name of the knot.

NodeValue is the value of a node and is valid only for text nodes.

NodeType is the type of node. Catalog is Element_node type

Here are a few of the following:
' Attribute_node '
' Cdata_section_node '
' Comment_node '
' Document_fragment_node '
' Document_node '
' Document_type_node '
' Element_node '
' Entity_node '
' Entity_reference_node '
' Notation_node '
' Processing_instruction_node '
' Text_node '

NodeTypes-known constants

For the XML file listed above, the author uses the following script, for readers to reference, can be modified according to their own needs.

#! /usr/bin/env python #coding = utf-8 from xml.dom.minidom Import parse import xml.dom.minidom from PIL import image, image Draw import os import imagefont Xml_path = '/users/lee/desktop/check_xml/experiments/xml/' filelists = Os.listdir (xml_pa TH) filelists.pop (0) Img_path = '/users/lee/desktop/check_xml/experiments/images/' Save_path = '/Users/Lee/Desktop/ check_xml/experiments/check_img/' num = 0 for file in Filelists:xml_file_path = Os.path.join (xml_path+file) domtr EE = xml.dom.minidom.parse (xml_file_path) Data = domtree.documentelement img_name_tmp = Data.getelementsbytagname ( "FileName") Img_name = Img_name_tmp[0].childnodes[0].data Objects = Data.getelementsbytagname ("Object") xmin_ 1 = [] ymin_1 = [] Xmax_1 = [] Ymax_1 = [] Name_1 = [] for object in objects:name = Object.g Etelementsbytagname (' name ') [0] Name_1.append (name.childnodes[0].data) xmin = Object.getelementsbytagname ( ' xmin ') [0] xmin_1.aPpend (int (xmin.childnodes[0].data)) ymin = Object.getelementsbytagname (' ymin ') [0] ymin_1.append (int (ymin). Childnodes[0].data)) Xmax = Object.getelementsbytagname (' xmax ') [0] xmax_1.append (int (xmax.childnodes[0].d ATA)) Ymax = Object.getelementsbytagname (' ymax ') [0] ymax_1.append (int (ymax.childnodes[0].data)) img
        = Image.open (Os.path.join (img_path+img_name+ '. jpg ')) Num_object = Len (name_1) for I in Range (Num_object): X_text = (Xmax_1[i]-xmin_1[i]) * 0.15 + xmin_1[i] Y_text = ymin_1[i]-draw_name = Imagedraw.draw (img ) Font1 = Imagefont.truetype ("Times_new_roman.ttf") Draw_name.text (X_text, Y_text), Name_1[i], fill= ( 255,0,0), font = font1) Draw_rect = Imagedraw.draw (img) draw_rect.rectangle ([(Xmin_1[i], ymin_1[i]), (xmax _1[i], Ymax_1[i])],outline= (255,0,0)) Img.save (Os.path.join (save_path + img_name + '. jpg ')) num = num + 1 if n Um% = = 0:pRint num  
 The above script can reflect the name and location information stored in the XML on the original image to check the accuracy of hand-annotated data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More