When using python for development, because Python's open-source ecosystem is very powerful, many class libraries are often used to implement the same function, developers are also faced with how to choose the best class library as a tool to assist development. This article will record the class libraries I tested when using python to process XML format data. Some Class Libraries cannot support some features due to inherent limitations. The class libraries or modules involved include XML (Python built-in), libxml2, lxml, and XPath. Note: The structure of the data processed in XML format in this document is as follows: Python code Input_xml_string = """ <Root> <Item> <Data version = "1.0" url = "http: // ***"/> <Data version = "2.0" url = "http: // ***"/> </Item> <Other> <Data version = "1.0" url = "http: // ***"/> <Data version = "2.0" url = "http: // ***"/> </Other> </Root> """ XML processing module of Python You can use the "getelementsbytagname" interface provided by this module to find the desired node. The instance "get_tagname" is as follows: Python code Import XML. Dom. minidom Def get_tagname (): Doc = xml. Dom. minidom. parsestring (input_xml_string) For node in Doc. getelementsbytagname ("data "): Print (node, node. tagname, node. getattribute ("version ")) The program running result is as follows: Python code (<Dom element: data at 0x89884cc>, u 'data', u '1. 0 ') (<Dom element: data at 0x898860c>, u 'data', u '2. 0 ') (<Dom element: data at 0x89887cc>, u 'data', u '1. 0 ') (<Dom element: data at 0x898890c>, u 'data', u '2. 0 ') Observe the preceding running result. The "getelementsbytagname" interface is used to search for all nodes named data. Sometimes, the function required by the program is to only use the data nodes under a node, such as the data node under the other node. You may think about it immediately. We can determine whether the parent node of the data node is other to meet the function. The instance "get_tagname_other" is as follows: Python code Import XML. Dom. minidom Def get_tagname_other (): Doc = xml. Dom. minidom. parsestring (input_xml_string) For node in Doc. getelementsbytagname ("data "): If node. parentnode. tagname = "other ": Print (node, node. tagname, node. getattribute ("version ")) The program running result is as follows: Python code (<Dom element: data at 0x936b7cc>, u 'data', u '1. 0 ') (<Dom element: data at 0x936b90c>, u 'data', u '2. 0 ') Observe the above running results. Well, the problem is solved. But if I want to find the data node under the other node and the attribute node version is 1.0, then we need to add more policies to filter out the data we need. Obviously, this method is not flexible enough. Therefore, we thought of using XPath to search for the nodes we need. The instance "get_xpath" is as follows: Python code Import XML. etree. elementtree From stringio import stringio File = stringio (input_xml_string) Def get_xpath (): Doc = xml. etree. elementtree. parse (file) For node in Doc. findall ("// item/Data "): Print (node, node. Tag, (node. Items ())) The program running result is as follows: Python code (<Element data at 90c4dcc>, 'data', [('url', 'HTTP: // *** '), ('version', '1. 0 ')]) (<Element data at 90c4e8c>, 'data', [('url', 'HTTP: // *** '), ('version', '2. 0 ')]) Looking at the above running results, the use of xpath obviously improves the readability of the program, but it still does not solve the problem above. This is due to the inherent lack of support for the XPath method by the XML module of Python, to ensure both readability and functional correctness, we need to use a third-party XML processing class library for python. Libxml2 Libxml2 is an XML Parser developed in C language. It is a free open-source software based on MIT license. Multiple programming languages are implemented based on it. This section describes the lxml module. The instance "get_xpath_1" is as follows: Python code Mport libxml2 Def get_xpath_1 (): Doc = libxml2.parsefile ("data. xml") # The structure of the Data. xml file is the same as that of input_xml_string. For node in Doc. xpatheval ("// item/data [@ version = '1. 0']"): Print (node, node. Name, (node. properties. Name, node. properties. Content )) Doc. freedoc () The program running result is as follows: Python code (<Xmlnode (data) object at 0x9326c6c>, 'data', ('version', '1. 0 ')) Observe the above running results to meet our needs. The "xpatheval ()" interface does not support the usage of similar templates, but does not affect the use. Because libxml2 is developed in C language, therefore, the method of using the API interface will inevitably be somewhat "unacceptable" (written or habitually used) Lxml Lxml is developed using the Python language based on libxml2 introduced above. It is more suitable for python developers than libxml2 ), the "XPath" interface supports template-like usage. The example "get_xpath_2" is as follows: Python code Import lxml. etree Def get_xpath_2 (): Doc = lxml. etree. parse (file) For node in Doc. XPath ("// item/data [@ version = $ name]", name = "1.0 "): Print (node, node. Tag, (node. Items ())) The program running result is as follows: Python code (<Element data at a1f784c>, 'data', [('version', '1. 0 '), ('url', 'HTTP: // ***')]) Xpath XPath is a officially recommended Python module that supports processing such as XPath. It is extended based on the python built-in XML processing module introduced in this article and can be used in combination, the "find" interface also supports usage similar to the template. The example "get_xpath_3" is as follows: Python code Import xpath Def get_xpath_3 (): Doc = xml. Dom. minidom. parsestring (input_xml_string) For node in XPath. Find ("// item/data [@ version = $ name]", Doc, name = "1.0 "): Print (node, node. tagname, node. getattribute ("version ")) The program running result is as follows: Python code (<Dom element: data at 0x89934cc>, u 'data', u '1. 0 ') Summary Through the practices of these class libraries, we have learned that python has various options when processing XML format data, we also learned that these class libraries are good at processing those aspects and usage methods of various class libraries. We can select appropriate class libraries based on actual needs to complete the development work. <II. Writing XML files in Python> conversion from Http://lulinbest.blog.sohu.com/75921823.html |