Python3 crawls and hooks recruitment data and python3 Crawlers
Use python to crawl and pull data
Step 1: download the required modules
Requests Enter cmd command: pip install requests press enter to automatically download online
Run the command pip install xlwt and press enter to enable automatic download.
Step 2: Find the web page you want to crawl (I am crawling the web page)
Select a browser (Firefox, Google) to capture packets using Google
Encoding tool (idea) (pyCharm) I use idea
Import requests # import the downloaded requestaimport xlwt # import the downloaded xlwt # use Google to find the corresponding webpage and press f12 To Go To The check page
# NetWork, XHR contains headers to find
Headers = {# first computer and server information, 'user-agent': 'mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) chrome/59.0.3071.115 Safari/537.36 ', # computer and server information # the second one is where you access the hook network, without which you think it is robot access. 'Referer': 'https: // www.lagou.com/jobs/list_python? LabelWords = & fromSearch = true & suginput = ', # The third is to identify the identity of some websites, and some do not need 'cookies': 'user _ trace_token = token; LGUID = token; index_location_city = % E5 % 85% A8 % E5 % 9B % BD; JSESSIONID = Beijing; _ gat = 1; PRE_UTM =; PRE_HOST = www.sogou.com; PRE_SITE = https % 3A % 2F % 2Fwww.sogou.com % 2 Flink % 3 Furl % 3 DhedJ JaC291NlQquFD-D9iKfCABISWiMgDLW1Nx6fG3psqHL_zYlG_a3mlRzfPLR2; small; TG-TRACK-CODE = index_search; _ gid = small; _ ga = ga1.2.1930895945.1505957133; small = 1505957579,1505957596, 1505957630,1505969456; small = 1505969469; LGSID = Small-Medium- 525400f775ce; LGRID = paths; SEARCH_ID = '#, from which page you enter} # data corresponds to a page whose pn is 1, which is equivalent to def getJobList (page) on the first page): data = {'first': 'false', 'pn ': page, 'kd': 'python'} # initiate a post request, the URL of the current webpage. res = requests. post ('https: // www.lagou.com/jobs/positionAjax.json? NeedAddtionalResult = ''false & isSchoolJob = 0', data = data, headers = headers) result = res. json () # display the data in json format similar to (key, value) jobs = result ['content'] ['positionresult'] ['result'] # return jobs for each corresponding query # return results excelTabel = xlwt. workbook () # create an excel Object sheet1 = excelTabel. add_sheet ('lagou ', cell_overwrite_ OK = True) sheet1.write (, 'Company name') # company name sheet1.write (, 'city') # city sheet1.write (, 'region ') # region sheet1.write (, 'full-time/00') # full-time/simply sheet1.write (, 'payroll ') # salary sheet1.write (, 'post') # position sheet1.write, 'years of Service ') # years of service sheet1.write (, 'Company size') # company scale sheet1.write (, 'diploma') # education level n = 1for page in range ): # cyclically output each page for job in getJobList (page = page ): # The following if judgment can be added or not added: if '1-3' in job ['workyear'] and 'backend developer' in job ['secondtype'] and 'bachelor' in job ['education']: # and 'chaoyang district 'in job ['District'] sheet1.write (n, 0, job ['companyfullname']) # company name sheet1.write (n, 1, job ['city']) # city sheet1.write (n, 2, job ['District ']) # region sheet1.write (n, 3, job ['jobnature']) # full-time/simply sheet1.write (n, 4, job ['salary ']) # salary sheet1.write (n, 5, job ['secondtype']) # job sheet1.write (n, 6, job ['workyear']) # sheet1.write (n, 7, job ['companysize']) # company scale sheet1.write (n, 8, job ['education']) # educational qualifications ')
In fact, I don't know how to insert the image list,
However, you can copy the code above to crawl the data and then study it slowly (the headers can be changed based on individual differences)
Python3:
Input and Output
Str (): The function returns a user-readable expression.
Str. format () replaces {} in the output statement and concatenates it with other strings.
Repr (): generate an easy-to-read parser expression
The repr () function can escape special characters in a string.
The repr () parameter can be any python object.
Read and Write files
Open (filename, mode) will return a file object
Filename: the variable is a string containing the file name you want to access.
Mode: determines the file opening mode. The default mode is read-only.
F = open ('C \ foo.txt ', w ):
Str = f. read ()
Print (str)
F. close (): close open files
F. readline (): reads a separate row from the file.
F. readlines (): returns all rows contained in the file.
F. write ('aaa'): writes aaaa to the file, and returns the number of characters written to the file.
F. tell () returns the location of the current object
F. seek (): changes the current file location