Crawling Job site information with Python

Last Update:2017-03-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes using Python to crawl job site information

This crawl is the Zhaopin website search "data Analyst" after the information.

Python version: python3.5.

The main package I use is BeautifulSoup + requests+csv

In addition, I grabbed a brief description of the recruitment content.

When the file is exported to a CSV file, it is found to be garbled when opened in Excel, but it is no problem to open it with the file software (such as notepad++).

To be able to display correctly when opened with Excel, I converted the following with pandas and added the above name. Once the conversion is complete, it will be displayed correctly. For the conversion with pandas, you can refer to my blog:

As the recruitment content is more descriptive, finally save the CSV file as an Excel file and adjust the format for easy viewing.

The final effect is as follows:

The implementation code is as follows: The code for crawling information is as follows:

# Code based on Python 3.x# _*_ coding:utf-8 _*_# __author: ' LEMON ' from BS4 import beautifulsoupimport requestsimport csv def download (URL): headers = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) gecko/20100101 firefox/51.0 '} req = Requests.get (URL, headers=headers) return req.textdef get_content (HTML) : Soup = beautifulsoup (html, ' lxml ') BODY = soup.body Data_main = body.find (' div ', {' class ': ' Newlist_list_conten            T '}) tables = Data_main.find_all (' table ') zw_list = [] for i,table in Enumerate (tables): if i = = 0:        Continue temp = [] TDs = Table.find (' tr '). Find_all (' td ') ZWMC = Tds[0].find (' a '). Get_text () Zw_link = Tds[0].find (' a '). Get (' href ') FKL = Tds[1].find (' span '). Get_text () GSMC = Tds[2].find (' a '). get_t        Ext () Zwyx = Tds[3].get_text () Gzdd = Tds[4].get_text () GBSJ = Tds[5].find (' span '). Get_text ()   Tr_brief = Table.find (' tr ', {' class ': ' Newlist_tr_detail '})     Brief = tr_brief.find (' li ', {' class ': ' Newlist_deatil_last '}). Get_text () temp.append (ZWMC) temp.append ( FKL) temp.append (GSMC) temp.append (Zwyx) temp.append (GZDD) temp.append (GBSJ) Temp.appen D (brief) temp.append (zw_link) zw_list.append (temp) return zw_listdef write_data (data, name): filename = Name with open (filename, ' a ', newline= ', encoding= ' Utf-8 ') as F:f_csv = Csv.writer (f) f_csv.writerows (data) if __name__ = = ' __main__ ': Basic_url = ' Http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%85%A8%E5%9B%BD &kw=%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90%e5%b8%88&sm=0&p= ' number_list = list (range) # Total number of page is and number in Number_list:num = number + 1 URL = basic_url + str (num) filename = ' Z        Hilian_da.csv ' html = download (URL) # print (HTML) data = get_content (HTML) # Print (data) Print (' Start saving page: ', num) write_data (data, filename)

The code for conversion with pandas is as follows:

# Code based on Python 3.x# _*_ coding:utf-8 _*_# __author: "LEMON" import pandas as Pddf = Pd.read_csv (' Zhilian_da.csv ', Header=none) Df.columns = [' Job name ', ' feedback rate ', ' Company name ', ' monthly salary ', ' work place ',           ' release date ', ' recruitment profile ', ' Web link ']# Output the adjusted Dataframe file to the new CSV file Df.to_csv (' Zhilian_da_update.csv ', index=false)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Crawling Job site information with Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Crawling Job site information with Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support