Python learning notes-XSD Verification Method for XML large files

Source: Internet
Author: User

Let's talk about the requirement first. Because we work with the customer, the other party requires to provide data in the specified XML format (through XML Schema. There is no problem with XML verification, but when the XML file is large, it is about MB. How do you determine whether the generated XML meets the definition of the XSD file. When you look at it with your eyes, this is only a few hundred thousand pieces of data that can be processed by the XML validation mechanism.
XML format verification methods are just a few. If your file is only a few MB, this can be verified by xmlspy and xmlpad tools. But this M file cannot be opened by these tools.
The target company is from Linux. in Linux, xmllint-schema *. XSD *. xml>/dev/null can be used for verification. What's depressing is that I use windows, and I can't write it myself.Code.

We found that no database in python2.5 supports XML schema. Only third-party users can be found. The lxml is found at the end (hard work, it took two days to get involved)
1: http://codespeak.net/lxml/
2: Provide the Verification Code directly:
#! /Usr/bin/ENV Python
# Coding = gb2312
From lxml import etree
Import timeit

Def checkxml ():
Xmlschema_doc = etree. parse ("local_feed.xsd.xml ")
XMLSCHEMA = etree. XMLSCHEMA (xmlschema_doc)
Doc = etree. parse ("google-local0.xml ")
Print XMLSCHEMA. Validate (DOC)
Print> open(mongolog.txt "," W "), XMLSCHEMA. error_log

If _ name _ = '_ main __':
Print 'start... '
T = timeit. Timer ('checkxml () ', 'From _ main _ import checkxml ')
Print T. Repeat (1, 1)
Print 'end. Any key exit... '
Input ()

TIPS:

    1. The legendary Python self-prepared battery in timeit. But I think some of them are not very useful. Let's talk about them next time.
    2. Print> output redirection. It is convenient to directly save errors as files
    3. There is also the use of the lxml module. This module looks powerful. Python downloads are also the highest official version. Learn more later.
    4. I wrote the same statement in C # And found that C # is less than 60 s, but the lxml method requires 133 S. liunx has no specific statistics, but it is also slow. Here we will only talk about the actual situation and not discuss the reasons.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.