Let's talk about the requirement first. Because we work with the customer, the other party requires to provide data in the specified XML format (through XML Schema. There is no problem with XML verification, but when the XML file is large, it is about MB. How do you determine whether the generated XML meets the definition of the XSD file. When you look at it with your eyes, this is only a few hundred thousand pieces of data that can be processed by the XML validation mechanism.
XML format verification methods are just a few. If your file is only a few MB, this can be verified by xmlspy and xmlpad tools. But this M file cannot be opened by these tools.
The target company is from Linux. in Linux, xmllint-schema *. XSD *. xml>/dev/null can be used for verification. What's depressing is that I use windows, and I can't write it myself.Code.
We found that no database in python2.5 supports XML schema. Only third-party users can be found. The lxml is found at the end (hard work, it took two days to get involved)
1: http://codespeak.net/lxml/
2: Provide the Verification Code directly:
#! /Usr/bin/ENV Python
# Coding = gb2312
From lxml import etree
Import timeit
Def checkxml ():
Xmlschema_doc = etree. parse ("local_feed.xsd.xml ")
XMLSCHEMA = etree. XMLSCHEMA (xmlschema_doc)
Doc = etree. parse ("google-local0.xml ")
Print XMLSCHEMA. Validate (DOC)
Print> open(mongolog.txt "," W "), XMLSCHEMA. error_log
If _ name _ = '_ main __':
Print 'start... '
T = timeit. Timer ('checkxml () ', 'From _ main _ import checkxml ')
Print T. Repeat (1, 1)
Print 'end. Any key exit... '
Input ()
TIPS:
- The legendary Python self-prepared battery in timeit. But I think some of them are not very useful. Let's talk about them next time.
- Print> output redirection. It is convenient to directly save errors as files
- There is also the use of the lxml module. This module looks powerful. Python downloads are also the highest official version. Learn more later.
- I wrote the same statement in C # And found that C # is less than 60 s, but the lxml method requires 133 S. liunx has no specific statistics, but it is also slow. Here we will only talk about the actual situation and not discuss the reasons.