Search 51CTO recommended blogs with Python and save to Excel

Source: Internet
Author: User

I. BACKGROUND

Recently in the Learning Crawler, using the requests module to obtain the page, BeautifulSoup to obtain the required content, and finally use the Xlsxwriter module to save the content to Excel, in this record, the following can extrapolate, using its crawl other content persisted and stored in the file, or databases, and so on.

Second, the Code

Two modules were written, Geturl3 and Getexcel3, and finally called within main

geturl3.pyThe code reads as follows:

#!/bin/env python#-*-coding:utf-8-*-# @Author: kaliarchimport requestsfrom bs4 import beautifulsoupclass get_urldic: #获取搜索关键字 def get_url (self): Urllist = [] First_url = ' http://blog.51cto.com/search/result?q= ' A Fter_url = ' &type=&page= ' Try:search = input ("Please input search Name:") page = int (Input ("Please input page:")) except Exception as E:print (' Input error: ', E) exit () fo        R num in range (1,page+1): url = first_url + search + after_url + str (num) urllist.append (URL)         Print ("Please wait ...") return Urllist,search #获取网页文件 def get_html (self,urllist): Response_list = [] For r_num in urllist:request = Requests.get (r_num) response = Request.content RE Sponse_list.append (response) return response_list #获取blog_name和blog_url def get_soup (Self,html_doc): R Esult = {} for G_num in html_doc:soup = BeautifulSoup (g_num, ' html.parser ') context = Soup.find_all (' A ', class_= ' m-1-4        FL ') for I in Context:title=i.get_text () Result[title.strip ()]=i[' href '] return resultif __name__ = = ' __main__ ': blog = get_urldic () urllist, search = Blog.get_url () Html_doc = Blog.get _html (urllist) result = Blog.get_soup (Html_doc) for k,v in Result.items (): Print (' Search Blog_name is:%s,blog _url is:%s '% (k,v))
The

getexcel3.py code reads as follows:

#!/bin/env python#-*-coding:utf-8-*-# @Author: Kaliarchimport xlsxwriterclass create_excle:def __init__ (self): Self.tag_list = ["Blog_name", "Blog_url"] def create_workbook (self,search= ""): Excle_name = search + '. xls X ' #定义excle名称 workbook = Xlsxwriter.        Workbook (excle_name) worksheet_m = workbook.add_worksheet (search) print (' Create%s ... '% excle_name) Return workbook,worksheet_m def col_row (self,worksheet): Worksheet.set_column (' a:a ', ') worksheet.set_r         ow (0, +) worksheet.set_column (' a:a ', +) worksheet.set_column (' b:b ', +) def shell_format (Self,workbook): #表头格式 Merge_format = Workbook.add_format ({' Bold ': 1, ' border ': 1, ' align ' : ' Center ', ' valign ': ' vcenter ', ' fg_color ': ' #FAEBD7 '}) #标题格式 Name_format = W Orkbook.add_format ({' Bold ': 1, ' border ': 1, ' align ': ' Center ',            ' valign ': ' vcenter ', ' fg_color ': ' #E0FFFF '}) #正文格式 Normal_format = workbook.a    Dd_format ({' Align ': ' Center ',}) return Merge_format,name_format,normal_format #写入title和列名  def write_title (self,worksheet,search,merge_format): title = search + "Results" worksheet.merge_range (' a1:b1 ',         Title, Merge_format) print (' Write title success ') def Write_tag (self,worksheet,name_format): Tag_row = 1            Tag_col = 0 for num in Self.tag_list:worksheet.write (Tag_row,tag_col,num,name_format)        Tag_col + = 1 print (' Write tag success ') #写入内容 def write_context (Self,worksheet,con_dic,normal_format):             row = 2 for k,v in Con_dic.items (): If row > Len (con_dic): Break col = 0             Worksheet.write (Row,col,k,normal_format) col+=1 worksheet.write (Row,col,v,normal_format)      Row+=1  Print (' Write context success ') #关闭excel def workbook_close (self,workbook): Workbook.close () if __name__ = = ' __main__ ': Print (' This is create Excel mode ')

main.pyThe code reads as follows:

#!/bin/env python# -*- coding:utf-8 -*-# @Author  : kaliarchimport geturl3import getexcel3#获取url字典def get_dic():    blog = geturl3.get_urldic()    urllist, search = blog.get_url()    html_doc = blog.get_html(urllist)    result = blog.get_soup(html_doc)    return result,search#写入excledef write_excle(urldic,search):    excle = getexcel3.create_excle()    workbook, worksheet = excle.create_workbook(search)    excle.col_row(worksheet)    merge_format, name_format, normal_format = excle.shell_format(workbook)    excle.write_title(worksheet,search,merge_format)    excle.write_tag(worksheet,name_format)    excle.write_context(worksheet,urldic,normal_format)    excle.workbook_close(workbook)def main():    url_dic ,search_name = get_dic()    write_excle(url_dic,search_name)if __name__ == ‘__main__‘:    main()
Third, the effect shows

Run the code, fill in the search keywords, and search how many pages

View generates an Excel named after the search keyword to open the written content

Use it to search for and maintain the 51CTO referral blog You need, you can search a few more

Search 51CTO recommended blogs with Python and save to Excel

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.