文章目錄
上一篇簡單介紹了python的基本文法,主要是從使用C或C++人的觀點來說的。這一篇詳細說一下,elementtree庫的用法。Elenmenttree是python2.5以後加入python標準庫的一個用C寫的python庫。
XML讀取
from xml.etree.ElementTree import ElementTree, Elementimport sysdef ReadFromXml(path): ''''read from xml and prase it author:limin path: the file path return ElementTree''' tree = ElementTree() tree.parse(path); return tree
仍然以上文的那個XML檔案為例子
<?xml version="1.0"?><pdml version="0" creator="wireshark/1.20.0.1"><packet> <proto name="geninfo" pos="0" showname="General information" size="98"> <field name="num" pos="0" show="1246" showname="Number" value="4de" size="98"/> <field name="len" pos="0" show="98" showname="Frame Length" value="62" size="98"/> <field name="caplen" pos="0" show="98" showname="Captured Length" value="62" size="98"/> <field name="timestamp" pos="0" show="Mar 6, 2013 18:28:28.729395000 China Standard Time" showname="Captured Time" value="1362565708.729395000" size="98"/> </proto> <proto name="frame" showname="Frame 1246: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)" size="98" pos="0"> <field name="frame.time" showname="Arrival Time: Mar 6, 2013 18:28:28.729395000 China Standard Time" size="0" pos="0" show="Mar 6, 2013 18:28:28.729395000"/> <field name="frame.time_epoch" showname="Epoch Time: 1362565708.729395000 seconds" size="0" pos="0" show="1362565708.729395000"/> <field name="frame.time_delta" showname="Time delta from previous captured frame: 0.000475000 seconds" size="0" pos="0" show="0.000475000"/> <field name="frame.time_delta_displayed" showname="Time delta from previous displayed frame: 0.000000000 seconds" size="0" pos="0" show="0.000000000"/> <field name="frame.time_relative" showname="Time since reference or first frame: 93.072253000 seconds" size="0" pos="0" show="93.072253000"/> <field name="frame.number" showname="Frame Number: 1246" size="0" pos="0" show="1246"/> <field name="frame.len" showname="Frame Length: 98 bytes (784 bits)" size="0" pos="0" show="98"/> <field name="frame.cap_len" showname="Capture Length: 98 bytes (784 bits)" size="0" pos="0" show="98"/> <field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/> <field name="frame.ignored" showname="Frame is ignored: False" size="0" pos="0" show="0"/> <field name="frame.protocols" showname="Protocols in frame: eth:ip:udp:mmtss:sicap" size="0" pos="0" show="eth:ip:udp:mmtss:sicap"/> <field name="frame.coloring_rule.name" showname="Coloring Rule Name: SICAP" size="0" pos="0" show="SICAP"/> <field name="frame.coloring_rule.string" showname="Coloring Rule String: sicap" size="0" pos="0" show="sicap"/> </proto> <proto name="eth" showname="Ethernet II, Src: 192.168.254.1 (00:0f:bb:69:93:ee), Dst: DCT-INF (00:10:18:cb:b5:fd)" size="14" pos="0"> <field name="eth.dst" showname="Destination: DCT-INF (00:10:18:cb:b5:fd)" size="6" pos="0" show="00:10:18:cb:b5:fd" value="001018cbb5fd"> <field name="eth.addr" showname="Address: DCT-INF (00:10:18:cb:b5:fd)" size="6" pos="0" show="00:10:18:cb:b5:fd" value="001018cbb5fd"/> <field name="eth.ig" showname=".... ...0 .... .... .... .... = IG bit: Individual address (unicast)" size="3" pos="0" show="0" value="0" unmaskedvalue="001018"/> <field name="eth.lg" showname=".... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)" size="3" pos="0" show="0" value="0" unmaskedvalue="001018"/> </field> <field name="eth.src" showname="Source: 192.168.254.1 (00:0f:bb:69:93:ee)" size="6" pos="6" show="00:0f:bb:69:93:ee" value="000fbb6993ee"> <field name="eth.addr" showname="Address: 192.168.254.1 (00:0f:bb:69:93:ee)" size="6" pos="6" show="00:0f:bb:69:93:ee" value="000fbb6993ee"/> <field name="eth.ig" showname=".... ...0 .... .... .... .... = IG bit: Individual address (unicast)" size="3" pos="6" show="0" value="0" unmaskedvalue="000fbb"/> <field name="eth.lg" showname=".... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)" size="3" pos="6" show="0" value="0" unmaskedvalue="000fbb"/> </field> <field name="eth.type" showname="Type: IP (0x0800)" size="2" pos="12" show="0x0800" value="0800"/> </proto> <proto name="ip" showname="Internet Protocol, Src: 192.168.254.68 (192.168.254.68), Dst: DCT-INF (192.168.254.2)" size="20" pos="14"> <field name="ip.version" showname="Version: 4" size="1" pos="14" show="4" value="45"/> <field name="ip.hdr_len" showname="Header length: 20 bytes" size="1" pos="14" show="20" value="45"/> <field name="ip.dsfield" showname="Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)" size="1" pos="15" show="0" value="00"> <field name="ip.dsfield.dscp" showname="0000 00.. = Differentiated Services Codepoint: Default (0x00)" size="1" pos="15" show="0x00" value="0" unmaskedvalue="00"/> <field name="ip.dsfield.ect" showname=".... ..0. = ECN-Capable Transport (ECT): 0" size="1" pos="15" show="0" value="0" unmaskedvalue="00"/> <field name="ip.dsfield.ce" showname=".... ...0 = ECN-CE: 0" size="1" pos="15" show="0" value="0" unmaskedvalue="00"/> </field> <field name="ip.len" showname="Total Length: 84" size="2" pos="16" show="84" value="0054"/> <field name="ip.id" showname="Identification: 0xce64 (52836)" size="2" pos="18" show="0xce64" value="ce64"/> <field name="ip.flags" showname="Flags: 0x00" size="1" pos="20" show="0x00" value="00"> <field name="ip.flags.rb" showname="0... .... = Reserved bit: Not set" size="1" pos="20" show="0" value="00"/> <field name="ip.flags.df" showname=".0.. .... = Don't fragment: Not set" size="1" pos="20" show="0" value="00"/> <field name="ip.flags.mf" showname="..0. .... = More fragments: Not set" size="1" pos="20" show="0" value="00"/> </field> <field name="ip.frag_offset" showname="Fragment offset: 0" size="2" pos="20" show="0" value="0000"/> <field name="ip.ttl" showname="Time to live: 254" size="1" pos="22" show="254" value="fe"/> <field name="ip.proto" showname="Protocol: UDP (17)" size="1" pos="23" show="17" value="11"/> <field name="ip.checksum" showname="Header checksum: 0x709b [correct]" size="2" pos="24" show="0x709b" value="709b"> <field name="ip.checksum_good" showname="Good: True" size="2" pos="24" show="1" value="709b"/> <field name="ip.checksum_bad" showname="Bad: False" size="2" pos="24" show="0" value="709b"/> </field> <field name="ip.src" showname="Source: 192.168.254.68 (192.168.254.68)" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/> <field name="ip.addr" showname="Source or Destination Address: 192.168.254.68 (192.168.254.68)" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/> <field name="ip.src_host" showname="Source Host: 192.168.254.68" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/> <field name="ip.host" showname="Source or Destination Host: 192.168.254.68" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/> <field name="ip.dst" showname="Destination: DCT-INF (192.168.254.2)" size="4" pos="30" show="192.168.254.2" value="c0a8fe02"/> <field name="ip.addr" showname="Source or Destination Address: DCT-INF (192.168.254.2)" hide="yes" size="4" pos="30" show="192.168.254.2" value="c0a8fe02"/> <field name="ip.dst_host" showname="Destination Host: DCT-INF" hide="yes" size="4" pos="30" show="DCT-INF" value="c0a8fe02"/> <field name="ip.host" showname="Source or Destination Host: DCT-INF" hide="yes" size="4" pos="30" show="DCT-INF" value="c0a8fe02"/> </proto> <proto name="udp" showname="User Datagram Protocol, Src Port: 35429 (35429), Dst Port: rfe (5002)" size="8" pos="34"> <field name="udp.srcport" showname="Source port: 35429 (35429)" size="2" pos="34" show="35429" value="8a65"/> <field name="udp.dstport" showname="Destination port: rfe (5002)" size="2" pos="36" show="5002" value="138a"/> <field name="udp.port" showname="Source or Destination Port: 35429" hide="yes" size="2" pos="34" show="35429" value="8a65"/> <field name="udp.port" showname="Source or Destination Port: 5002" hide="yes" size="2" pos="36" show="5002" value="138a"/> <field name="udp.length" showname="Length: 64" size="2" pos="38" show="64" value="0040"/> <field name="udp.checksum_coverage" showname="Checksum coverage: 64" hide="yes" size="0" pos="38" show="64"/> <field name="udp.checksum" showname="Checksum: 0xf3f8 [validation disabled]" size="2" pos="40" show="0xf3f8" value="f3f8"> <field name="udp.checksum_good" showname="Good Checksum: False" size="2" pos="40" show="0" value="f3f8"/> <field name="udp.checksum_bad" showname="Bad Checksum: False" size="2" pos="40" show="0" value="f3f8"/> </field> </proto>
調用讀取函數後,可以看到在記憶體中檔案的組織圖式這樣的
檔案在記憶體中是一個樹結構,最外層是elementtree這個對象的內建方法,檔案中的內容
elementtree讀取的資料,操作的時候到要先獲得root,如這個檔案,root就是
<pdml version="0" creator="wireshark/1.20.0.1">
獲得root的函數為
root = tree.getroot()
root節點下面,具備的屬性有
children:root的子節點
attrib:屬性,也就是XML檔案中用
version="0" creator="wireshark/1.20.0.1
text:沒有用<>包圍的部分
展開root中的children節點
就可以看到root的節點,也就是用XML檔案中<packet></packet> 包圍的部分,同時也可以看到packet節點的子節點proto,也就是XML檔案中用<proto></proto>包圍的部分。
到這來,基本上已經清楚了。elementtree這個庫將XML讀入記憶體用層化的機構串連。
XML檔案中資料的讀取
如上文所說,我們要讀取的就是各個層級節點下的attrib,tag,tail,text這幾個參數。
幾個讀取的方法
直接讀取
py2.7支援直接讀取的方式,如以下代碼
for messagelstParaNode in node: #1st<field if messagelstParaNode.attrib['name'] == 'sicap.header': #sicap header SicapHead = parseSiCapHead(messagelstParaNode) insertRecSendFieldIntoList(SicapMessageList,SicapHead) else: if messagelstParaNode.attrib['name'].find('sicap.') == 0: #para field 1st message type templist.insert(0,messagelstParaNode.attrib['showname']) templist.insert(1,SicapHead['value']) SicapMessageList.insert(0,templist) for message2stParaNode in messagelstParaNode: #para field 2st message GetFieldinfo(SicapMessageList,message2stParaNode) for message3stParaNode in message2stParaNode: #para field 3st message GetFieldinfo(SicapMessageList,message3stParaNode) for message4stParaNode in message3stParaNode: #para field 4st message GetFieldinfo(SicapMessageList,message4stParaNode) for message5stParaNode in message4stParaNode: #para field 5st message GetFieldinfo(SicapMessageList,message5stParaNode) for message6stParaNode in message5stParaNode: GetFieldinfo(SicapMessageList,message6stParaNode)
這個代碼中,node就是root節點對象,一共解析了6層節點。在相應的節點層次,就可以使用採用如下方法訪問各個參數
attrib:
attrib 在elemettree中是用字典格式儲存的,使用它的key就可以訪問,如
messagelstParaNode.attrib['name']