recently busy a requirement: convert an HTML document in a string form into Excel.
Decomposition requirements:
① Implementing language ———— Python
②html Parse ———— Parse the document tree with the Etree tool of the lxml Library, XPath Way
③ Write Excel ———— write Excel with XLWT library
Code snippet:
#-*-Coding:utf-8-*-
From __future__ import unicode_literals
Import OS, sys
Reload (SYS)
Sys.setdefaultencoding (' UTF8 ')
Import MySQLdb
Import JSON
Import XLWT
From lxml import etree
# Methods for parsing HTML strings
def change (data):
html = etree. HTML (str (data))
DIVs = Html.xpath ('//div[@class = "content"]/div ')
Img_top = Divs[0].xpath ('./img/@src ')
P_top_tmp_list = Divs[0].xpath ('./p/text () ')
... ...
# Ways to write Excel
def write_excel (filename, data):
Book = XLWT. Workbook () #创建excel对象
Sheet = book.add_sheet (' Sheet1 ') #添加一个表
c = 0 #保存当前列
For d in data: #取出data中的每一个元组存到表格的每一行
For index in range (len (d)): #将每一个元组中的每一个单元存到每一列
Sheet.write (C,index,d[index])
c + = 1
Book.save (filename) #保存excel
XPath Parsing HTML tags