Last learned to crawl the picture, this time want to try to crawl business Contact phone, of course, here is purely personal technical learning, crawl after the timely deletion, not used for other illegal purposes, all the consequences at your own risk.
First I studied with 114 yellow pages of data.
The following four are used in the module, the first 2 need to install, the next 2 is Python comes with.
import requestsfrom bs4 import BeautifulSoupimport csvimport time
Then, write a function to get to the page kind of data, remember the last return back, because the following function to write data into the CSV.
DefGet_content(url,data=none): Header = {' Accept ':' text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 ',' Accept-encoding ':' gzip, deflate ',' Accept-language ':' zh-cn,zh;q=0.8 ', ' user-agent ': ' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.104 safari/537.36 ',} r = Requests.get (URL, headers=header) s OUP = BeautifulSoup (r.content, ' html.parser ') data = Soup.body.find ( ' div ', { ' ID ': ' News_con '}) ul = Data.find ( ' ul ') lis = Ul.find_all ( ' Li ') pthons=[] for Item in lis:rows=[] name= item.find ( ' H4 '). String Rows.append (name) Tel = item.find_all ( "div") [2].string Rows.append (tel) pthons.append (rows) time.sleep (1) return Pthons
Then: Write the data into the table. I use CSV here for easy viewing.
def write_data(data,name): file_name=name with open(file_name, "w", newline=‘‘) as csvfile: writer = csv.writer(www.dejiaylsmile.cn csvfile) writer.writerow(["商铺名称", "联系电话"]) writer.writerows(data) print(‘抓取完成‘
The final step is to execute these functions:
if __name__ == ‘__main__‘: url = ‘http://ty.114chn.com/CustomerInfo/Customers? www.yingka178.com cid=008004008&page=2‘ mydata = get_content(www.078881.cn url) write_data(mydata,‘phone.csv‘www.dfzx157.com)
Here I think I should write the URL as dynamic, because there are pages in it. Let page write loop auto +1, of course, you can see how many pages on the page. Write a loop to execute. is more perfect.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Python crawl merchant Contact phone and various data