Capture job-seeking website information using python, and capture website information using python

Last Update:2017-03-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is the information captured after the Zhaopin recruitment website searches for "data analysts.

Python version: python3.5.

My main package is Beautifulsoup + Requests + csv.

In addition, I captured the simple description of the recruitment content.

After the file is output to the csv file, it is found that it is garbled when opened in excel, but it is no problem to open it with file software (such as notepad ++.

In order to display it correctly when it can be opened in Excel, I used pandas to convert the following and add the column name. After the conversion, it will be displayed correctly. For pandas conversion, refer to my blog:

Because there are many descriptions of the recruitment content, save the csv file as an excel file and adjust the following format for ease of viewing.

The final effect is as follows:

The implementation code is as follows:

 1 # Code based on Python 3.x 2 # _*_ coding: utf-8 _*_ 3 # __Author: "LEMON" 4  5  6 from bs4 import BeautifulSoup 7 import requests 8 import csv 9 10 11 def download(url):12     headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0'}13     req = requests.get(url, headers=headers)14     return req.text15 16 17 def get_content(html):18     soup = BeautifulSoup(html, 'lxml')19     body = soup.body20     data_main = body.find('div', {'class': 'newlist_list_content'})21     tables = data_main.find_all('table')22 23     zw_list = []24     for i,table in enumerate(tables):25         if i == 0:26             continue27         temp = []28         tds = table.find('tr').find_all('td')29         zwmc = tds[0].find('a').get_text()30         zw_link = tds[0].find('a').get('href')31         fkl = tds[1].find('span').get_text()32         gsmc = tds[2].find('a').get_text()33         zwyx = tds[3].get_text()34         gzdd = tds[4].get_text()35         gbsj = tds[5].find('span').get_text()36 37         tr_brief = table.find('tr', {'class': 'newlist_tr_detail'})38         brief = tr_brief.find('li', {'class': 'newlist_deatil_last'}).get_text()39 40         temp.append(zwmc)41         temp.append(fkl)42         temp.append(gsmc)43         temp.append(zwyx)44         temp.append(gzdd)45         temp.append(gbsj)46         temp.append(brief)47         temp.append(zw_link)48 49         zw_list.append(temp)50     return zw_list51 52 53 def write_data(data, name):54     filename = name55     with open(filename, 'a', newline='', encoding='utf-8') as f:56         f_csv = csv.writer(f)57         f_csv.writerows(data)58 59 if __name__ == '__main__':60 61     basic_url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%85%A8%E5%9B%BD&kw=%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%B8%88&sm=0&p='62 63     number_list = list(range(90)) # total number of page is 9064     for number in number_list:65         num = number + 166         url = basic_url + str(num)67         filename = 'zhilian_DA.csv'68         html = download(url)69         # print(html)70         data = get_content(html)71         # print(data)72         print('start saving page:', num)73         write_data(data, filename)

The Code converted using pandas is as follows:

1 # Code based on Python 3.x 2 # _ * _ coding: UTF-8 _ * _ 3 # _ Author: "LEMON" 4 5 import pandas as pd 6 7 df = pd.read_csv('zhilian_DA.csv ', header = None) 8 9 10 df. columns = ['position name', 'feedback rate', 'Company name', 'monthly salary ', 'workplace', 11' release date ', 'recruitment introduction ', 'webpage link'] 12 13 # output the adjusted dataframe file to the new csv file 14 df.to_csv('zhilian_DA_update.csv ', index = False)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Capture job-seeking website information using python, and capture website information using python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Capture job-seeking website information using python, and capture website information using python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support