This time to bring you a combination of Python and XML practical tutorial, Python and XML combination of practical considerations, the following is the actual case, take a look.
The name of this project is not as good as XML called "omnipotent" is called auto-build site, according to an XML file, generate the corresponding directory structure of the site, but only HTML is too simple, if you can create CSS that is more powerful. This need to follow up research and development, first to study how the HTML site structure. Since the Web site is built from an XML structure, everything should be done by this XML file. First look at this XML file, Website.xml:
<website> <page name= "index" title= "Home page" >
With this file, here's how to build a Web site from this file.
First of all we want to parse this XML file, Python parsing xml and in Java, there are two ways, sax and DOM, two ways to deal with the difference in speed and range, the former is about efficiency, each time only a small part of the document, quickly and effectively use memory, The latter is the opposite of the process, the first load all the documents into memory, and then processing, slower, also more memory consumption, the only advantage is that the entire document can be manipulated.
Using sax in Python to process XML first introduces the parse function in Xml.sax and the ContentHandler in Xml.sax.handler, which is used in conjunction with the parse function. Use the following: Parse (' Xxx.xml ', Xxxhandler), which xxxhandler to inherit the ContentHandler above, but as long as inheritance on the line, do not need to make a difference. The parse function then processes the XML file, invoking the Startelement function and the EndElement function in the Xxxhandler to start and end the label in the XML. The middle procedure uses a function called characters to handle all strings inside the label.
With these realizations, we already know how to deal with the XML file, and then we look at the source of the evil Website.xml file, analysis of its structure, only two nodes: page and directory, it is obvious that page represents a pages, directory represents a directory.
So the idea of dealing with this XML file becomes clearer. Read each node of the XML file, and then determine if it is page or directory, if it is a page, create an HTML page and write the contents of the node to the file. If you encounter directory, create a folder and then process its internal page node (if one exists).
The following part of the code, the book's implementation is more complex, more flexible. Look first, then in the analysis.
From Xml.sax.handler import contenthandlerfrom xml.sax import parseimport Osclass dispatcher:def dispatch (self, prefix , name, attrs=none): mname = prefix + name.capitalize () dname = ' default ' + prefix.capitalize () metho D = getattr (self, mname, None) if callable (method): args = () Else:method = GetAttr (self, dname, None) args = name, if prefix = = ' start ': args + = Attrs, if callable (method): Method (*args) de F startelement (self, Name, attrs): Self.dispatch ("Start", Name, Attrs) def endElement (self, name): SELF.D Ispatch (' End ', name) class Websiteconstructor (Dispatcher, contenthandler): passthrough = False def init (self, directo RY): Self.directory = [directory] self.ensuredirectory () def ensuredirectory (self): path = Os.path . Join (*self.directory) Print path print '----' If not Os.path.isdir (path): Os.makedirs (PATH) def c Haracters (self, chars): If Self.passthrough:self.out.write (chars) def defaultstart (self, Name, attrs): if Self.passthrough: Self.out.write (' < ' + name) for key, Val in Attrs.items (): Self.out.write ('%s= "%s" '% ( Key, Val)) Self.out.write (' > ') def defaultend (self, name): if Self.passthrough:self.ou T.write (' </%s> '% name) def startdirectory (self, attrs): Self.directory.append (attrs[' name ') "Self." Ensuredirectory () def enddirectory (self): print ' Enddirectory ' Self.directory.pop () def startpage (self , attrs): print ' startpage ' filename = os.path.join (*self.directory + [attrs[' name ']+ '. html ']) SELF.O UT = open (filename, ' W ') Self.writeheader (attrs[' title ') Self.passthrough = True def endpage (self): print ' endpage ' Self.passthrough = False self.writefooter () self.out.close () def writeheader (SE LF, title): Self.Out.write ('
It seems that the program above the analysis of a few complex, but the great man Fluffy said that any complex procedures are paper tigers. Let's analyze this program again.
First see this program is there are two classes, in fact, can be considered as a class, because of the inheritance.
And then see what more it, in addition to our analysis of the startelement and endelement and characters, more out of the startpage,endpage;startdirectory,enddirectory; Defaultstart,defaultend;ensuredirectory;writeheader,writefooter; and dispatch, these functions. In addition to dispatch, the preceding functions are well understood, each pair of functions is purely processing the corresponding HTML tags and XML nodes. The complexity of dispatch is that it is used to dynamically flatten functions and perform them.
Dispatch's approach is to determine whether a corresponding function, such as StartPage, is performed based on the passed parameter (that is, the name of the operation and the name of the node), and if it does not exist, execute the default+ operation name: such as Defaultstart.
A function after a function is clear, you know what the whole process is. First, create a public_html file, store the entire Web site, and then read the XML nodes, Startelement and EndElement call dispatch for processing. Then it is dispatch how to invoke the specific processing function. So far, the project is finished.
One of the main things to master is that Python uses sax to process XML, and the other is the use of functions in Python, such as getattr, asterisks when passing parameters ...
Believe that you have read the case of this article you have mastered the method, more exciting please pay attention to the PHP Chinese network other related articles!
Recommended reading:
How Python writes data in-frame data to the database
How the object life cycle is used in Python replication