Python file Processing: parsing. xml file __python

Source: Internet
Author: User
Tags truncated

XML (extensible Markup Language) refers to Extensible Markup Language, which is designed to transmit and store data, has become the core of many new technologies at present and has different applications in different fields. It is the inevitable product of web development to a certain stage, it has both the core features of SGML and the simplicity of HTML, and it has many new features, such as clear and good structure.

When the target detection is to manually mark the dataset, The target location information for the annotation is usually written to an. xml file, but in the examination of data and data cleaning, the face of a bunch of coordinate data is difficult to check the accuracy of the annotation, so it is often necessary to parse the XML file, read out the preservation of the goals and objectives of some information.
The XML file that needs to be parsed is as follows

<annotation> <folder>Images</folder> <filename>00001</filename> <path>e:\ images\00001.jpg</path> <source> <database>Unknown</database> </source> &LT;SIZE&G
    T <width>432</width>  

Ok, structurally, it's much like our common HTML Hypertext Markup Language. But they are designed for different purposes, Hypertext Markup Language is designed to display data, its focus is the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.

Then it has the following characteristics:

First, it is composed of a label pair:
<aa></aa>
Tags can have attributes: <aa id= ' 123 ' ></aa>
Tag pairs can embed data:<aa>abc</aa>
Tags can be embedded in child tags (with hierarchical relationships):

<aa>
     <bb></bb>
</aa>

The XML file above holds the target's name: name and four coordinates (xmin, ymin, Xmax, ymax), and uses Python to parse it.

#coding =utf-8
Import  xml.dom.minidom

#打开xml文档
dom = xml.dom.minidom.parse (' abc.xml ')

# Get document Element Object
root = dom.documentelement
print root.nodename
print root.nodevalue print
root.nodetype
print root. Element_node

Mxl.dom.minidom modules are used to process XML files, so they are introduced first.

Xml.dom.minidom.parse () is used to open an XML file and put this file object Dom variable.

DocumentElement is used to get the document elements of the DOM object and give the object to root

Each node has its Nodename,nodevalue,nodetype attribute.

NodeName is the name of the knot.

NodeValue is the value of a node and is valid only for text nodes.

NodeType is the type of node. Catalog is Element_node type

Here are a few of the following:
' Attribute_node '
' Cdata_section_node '
' Comment_node '
' Document_fragment_node '
' Document_node '
' Document_type_node '
' Element_node '
' Entity_node '
' Entity_reference_node '
' Notation_node '
' Processing_instruction_node '
' Text_node '

NodeTypes-known constants

For the XML file listed above, the author uses the following script, for readers to reference, can be modified according to their own needs.

#! /usr/bin/env python #coding = utf-8 from xml.dom.minidom Import parse import xml.dom.minidom from PIL import image, image Draw import os import imagefont Xml_path = '/users/lee/desktop/check_xml/experiments/xml/' filelists = Os.listdir (xml_pa TH) filelists.pop (0) Img_path = '/users/lee/desktop/check_xml/experiments/images/' Save_path = '/Users/Lee/Desktop/ check_xml/experiments/check_img/' num = 0 for file in Filelists:xml_file_path = Os.path.join (xml_path+file) domtr EE = xml.dom.minidom.parse (xml_file_path) Data = domtree.documentelement img_name_tmp = Data.getelementsbytagname ( "FileName") Img_name = Img_name_tmp[0].childnodes[0].data Objects = Data.getelementsbytagname ("Object") xmin_ 1 = [] ymin_1 = [] Xmax_1 = [] Ymax_1 = [] Name_1 = [] for object in objects:name = Object.g Etelementsbytagname (' name ') [0] Name_1.append (name.childnodes[0].data) xmin = Object.getelementsbytagname ( ' xmin ') [0] xmin_1.aPpend (int (xmin.childnodes[0].data)) ymin = Object.getelementsbytagname (' ymin ') [0] ymin_1.append (int (ymin). Childnodes[0].data)) Xmax = Object.getelementsbytagname (' xmax ') [0] xmax_1.append (int (xmax.childnodes[0].d ATA)) Ymax = Object.getelementsbytagname (' ymax ') [0] ymax_1.append (int (ymax.childnodes[0].data)) img
        = Image.open (Os.path.join (img_path+img_name+ '. jpg ')) Num_object = Len (name_1) for I in Range (Num_object): X_text = (Xmax_1[i]-xmin_1[i]) * 0.15 + xmin_1[i] Y_text = ymin_1[i]-draw_name = Imagedraw.draw (img ) Font1 = Imagefont.truetype ("Times_new_roman.ttf") Draw_name.text (X_text, Y_text), Name_1[i], fill= ( 255,0,0), font = font1) Draw_rect = Imagedraw.draw (img) draw_rect.rectangle ([(Xmin_1[i], ymin_1[i]), (xmax _1[i], Ymax_1[i])],outline= (255,0,0)) Img.save (Os.path.join (save_path + img_name + '. jpg ')) num = num + 1 if n Um% = = 0:pRint num 

The above script can reflect the name and location information stored in the XML on the original image to check the accuracy of hand-annotated data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.