Crawling Job site information with Python

Source: Internet
Author: User
This article describes using Python to crawl job site information

This crawl is the Zhaopin website search "data Analyst" after the information.

Python version: python3.5.

The main package I use is BeautifulSoup + requests+csv

In addition, I grabbed a brief description of the recruitment content.

When the file is exported to a CSV file, it is found to be garbled when opened in Excel, but it is no problem to open it with the file software (such as notepad++).

To be able to display correctly when opened with Excel, I converted the following with pandas and added the above name. Once the conversion is complete, it will be displayed correctly. For the conversion with pandas, you can refer to my blog:

As the recruitment content is more descriptive, finally save the CSV file as an Excel file and adjust the format for easy viewing.

The final effect is as follows:

The implementation code is as follows: The code for crawling information is as follows:

# Code based on Python 3.x# _*_ coding:utf-8 _*_# __author: ' LEMON ' from BS4 import beautifulsoupimport requestsimport csv def download (URL): headers = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) gecko/20100101 firefox/51.0 '} req = Requests.get (URL, headers=headers) return req.textdef get_content (HTML) : Soup = beautifulsoup (html, ' lxml ') BODY = soup.body Data_main = body.find (' div ', {' class ': ' Newlist_list_conten            T '}) tables = Data_main.find_all (' table ') zw_list = [] for i,table in Enumerate (tables): if i = = 0:        Continue temp = [] TDs = Table.find (' tr '). Find_all (' td ') ZWMC = Tds[0].find (' a '). Get_text () Zw_link = Tds[0].find (' a '). Get (' href ') FKL = Tds[1].find (' span '). Get_text () GSMC = Tds[2].find (' a '). get_t        Ext () Zwyx = Tds[3].get_text () Gzdd = Tds[4].get_text () GBSJ = Tds[5].find (' span '). Get_text ()   Tr_brief = Table.find (' tr ', {' class ': ' Newlist_tr_detail '})     Brief = tr_brief.find (' li ', {' class ': ' Newlist_deatil_last '}). Get_text () temp.append (ZWMC) temp.append ( FKL) temp.append (GSMC) temp.append (Zwyx) temp.append (GZDD) temp.append (GBSJ) Temp.appen D (brief) temp.append (zw_link) zw_list.append (temp) return zw_listdef write_data (data, name): filename = Name with open (filename, ' a ', newline= ', encoding= ' Utf-8 ') as F:f_csv = Csv.writer (f) f_csv.writerows (data) if __name__ = = ' __main__ ': Basic_url = ' Http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%85%A8%E5%9B%BD &kw=%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90%e5%b8%88&sm=0&p= ' number_list = list (range) # Total number of page is and number in Number_list:num = number + 1 URL = basic_url + str (num) filename = ' Z        Hilian_da.csv ' html = download (URL) # print (HTML) data = get_content (HTML) # Print (data) Print (' Start saving page: ', num) write_data (data, filename) 

The code for conversion with pandas is as follows:

# Code based on Python 3.x# _*_ coding:utf-8 _*_# __author: "LEMON" import pandas as Pddf = Pd.read_csv (' Zhilian_da.csv ', Header=none) Df.columns = [' Job name ', ' feedback rate ', ' Company name ', ' monthly salary ', ' work place ',           ' release date ', ' recruitment profile ', ' Web link ']# Output the adjusted Dataframe file to the new CSV file Df.to_csv (' Zhilian_da_update.csv ', index=false)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.