Python crawl merchant Contact phone and various data

Source: Internet
Author: User

Last learned to crawl the picture, this time want to try to crawl business Contact phone, of course, here is purely personal technical learning, crawl after the timely deletion, not used for other illegal purposes, all the consequences at your own risk.

First I studied with 114 yellow pages of data.
The following four are used in the module, the first 2 need to install, the next 2 is Python comes with.

import requestsfrom bs4 import BeautifulSoupimport csvimport time

Then, write a function to get to the page kind of data, remember the last return back, because the following function to write data into the CSV.

DefGet_content(url,data=none): Header = {' Accept ':' text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 ',' Accept-encoding ':' gzip, deflate ',' Accept-language ':' zh-cn,zh;q=0.8 ', ' user-agent ':  ' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.104 safari/537.36 ',} r = Requests.get (URL, headers=header) s OUP = BeautifulSoup (r.content,  ' html.parser ') data = Soup.body.find (  ' div ', { ' ID ':  ' News_con '}) ul = Data.find ( ' ul ') lis = Ul.find_all ( ' Li ') pthons=[] for Item in lis:rows=[] name= item.find ( ' H4 '). String Rows.append (name) Tel = item.find_all ( "div") [2].string Rows.append (tel) pthons.append (rows) time.sleep (1) return Pthons               

Then: Write the data into the table. I use CSV here for easy viewing.

def write_data(data,name):    file_name=name    with open(file_name, "w", newline=‘‘) as csvfile: writer = csv.writer(www.dejiaylsmile.cn  csvfile) writer.writerow(["商铺名称", "联系电话"]) writer.writerows(data) print(‘抓取完成‘

The final step is to execute these functions:

if __name__ == ‘__main__‘:    url = ‘http://ty.114chn.com/CustomerInfo/Customers? www.yingka178.com cid=008004008&page=2‘    mydata = get_content(www.078881.cn url)    write_data(mydata,‘phone.csv‘www.dfzx157.com)
    • 1
    • 2
    • 3
    • 4

Here I think I should write the URL as dynamic, because there are pages in it. Let page write loop auto +1, of course, you can see how many pages on the page. Write a loop to execute. is more perfect.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Python crawl merchant Contact phone and various data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.