Python3 crawls and hooks recruitment data and python3 Crawlers

Last Update:2017-09-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞


Use python to crawl and pull data
Step 1: download the required modules
Requests Enter cmd command: pip install requests press enter to automatically download online
Run the command pip install xlwt and press enter to enable automatic download.
Step 2: Find the web page you want to crawl (I am crawling the web page)
Select a browser (Firefox, Google) to capture packets using Google
Encoding tool (idea) (pyCharm) I use idea

Import requests # import the downloaded requestaimport xlwt # import the downloaded xlwt # use Google to find the corresponding webpage and press f12 To Go To The check page

# NetWork, XHR contains headers to find
Headers = {# first computer and server information, 'user-agent': 'mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) chrome/59.0.3071.115 Safari/537.36 ', # computer and server information # the second one is where you access the hook network, without which you think it is robot access. 'Referer': 'https: // www.lagou.com/jobs/list_python? LabelWords = & fromSearch = true & suginput = ', # The third is to identify the identity of some websites, and some do not need 'cookies': 'user _ trace_token = token; LGUID = token; index_location_city = % E5 % 85% A8 % E5 % 9B % BD; JSESSIONID = Beijing; _ gat = 1; PRE_UTM =; PRE_HOST = www.sogou.com; PRE_SITE = https % 3A % 2F % 2Fwww.sogou.com % 2 Flink % 3 Furl % 3 DhedJ JaC291NlQquFD-D9iKfCABISWiMgDLW1Nx6fG3psqHL_zYlG_a3mlRzfPLR2; small; TG-TRACK-CODE = index_search; _ gid = small; _ ga = ga1.2.1930895945.1505957133; small = 1505957579,1505957596, 1505957630,1505969456; small = 1505969469; LGSID = Small-Medium- 525400f775ce; LGRID = paths; SEARCH_ID = '#, from which page you enter} # data corresponds to a page whose pn is 1, which is equivalent to def getJobList (page) on the first page): data = {'first': 'false', 'pn ': page, 'kd': 'python'} # initiate a post request, the URL of the current webpage. res = requests. post ('https: // www.lagou.com/jobs/positionAjax.json? NeedAddtionalResult = ''false & isSchoolJob = 0', data = data, headers = headers) result = res. json () # display the data in json format similar to (key, value) jobs = result ['content'] ['positionresult'] ['result'] # return jobs for each corresponding query # return results excelTabel = xlwt. workbook () # create an excel Object sheet1 = excelTabel. add_sheet ('lagou ', cell_overwrite_ OK = True) sheet1.write (, 'Company name') # company name sheet1.write (, 'city') # city sheet1.write (, 'region ') # region sheet1.write (, 'full-time/00') # full-time/simply sheet1.write (, 'payroll ') # salary sheet1.write (, 'post') # position sheet1.write, 'years of Service ') # years of service sheet1.write (, 'Company size') # company scale sheet1.write (, 'diploma') # education level n = 1for page in range ): # cyclically output each page for job in getJobList (page = page ): # The following if judgment can be added or not added: if '1-3' in job ['workyear'] and 'backend developer' in job ['secondtype'] and 'bachelor' in job ['education']: # and 'chaoyang district 'in job ['District'] sheet1.write (n, 0, job ['companyfullname']) # company name sheet1.write (n, 1, job ['city']) # city sheet1.write (n, 2, job ['District ']) # region sheet1.write (n, 3, job ['jobnature']) # full-time/simply sheet1.write (n, 4, job ['salary ']) # salary sheet1.write (n, 5, job ['secondtype']) # job sheet1.write (n, 6, job ['workyear']) # sheet1.write (n, 7, job ['companysize']) # company scale sheet1.write (n, 8, job ['education']) # educational qualifications ')

In fact, I don't know how to insert the image list,

However, you can copy the code above to crawl the data and then study it slowly (the headers can be changed based on individual differences)

Python3:

Input and Output

Str (): The function returns a user-readable expression.

Str. format () replaces {} in the output statement and concatenates it with other strings.

Repr (): generate an easy-to-read parser expression

The repr () function can escape special characters in a string.

The repr () parameter can be any python object.

Read and Write files

Open (filename, mode) will return a file object

Filename: the variable is a string containing the file name you want to access.

Mode: determines the file opening mode. The default mode is read-only.

F = open ('C \ foo.txt ', w ):

Str = f. read ()

Print (str)

F. close (): close open files

F. readline (): reads a separate row from the file.

F. readlines (): returns all rows contained in the file.

F. write ('aaa'): writes aaaa to the file, and returns the number of characters written to the file.

F. tell () returns the location of the current object

F. seek (): changes the current file location

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python3 crawls and hooks recruitment data and python3 Crawlers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python3 crawls and hooks recruitment data and python3 Crawlers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support