Fourth-crawl HRM python related work

Source: Internet
Author: User

Environment: Python3 Pycharm

Module: Requests,xlwt,urllib.request,re

Normal three-step walk:

1. Get the source code

2. Match the source code to get the target data

3. Save to File

Directly on the code, listing two ways to get the source code, the way to store 3 files. You can choose freely.

The first red part of the quotation marks inside the site URL, too long not posted up. Find the way: Baidu HRM official website, search python, click on page 2, the Address bar address paste into single quotation marks inside. Find the 2.html section and replace 2 with {}.

The second red part gets the number of pages that you want to get the data from and fill it out according to your needs.

#Import RequestsImportRe#for regular matching#Import XLWT #excel表格需要用到Importurllib.request#1. Using the requests module to obtain the HTML source page#def get_content (page):#url = '. Format (page)#html = requests.get (URL). Content.decode (' GBK ')#return HTML#1. Get the source code with the Urllib moduledefget_content (page): URL = ' ' . Format (page)------------1 HTML= Urllib.request.urlopen (URL). read (). Decode ('GBK')    returnHTML#2. Get position, salary, company namedefget_data (HTML): Reg= Re.compile (r'class= "T1". *?<a target= "_blank" title= "(. *?)". *?<span class= "T2" ><a target= "_blank"'R'title= "(. *?)". *?<span class= "T3" > (. *?) </span>.*?<span class= "T4" > (. *?) </span>.*?'R'<span class= "T5" > (. *?) </span>', Re. S) Items=Re.findall (reg,html)returnItems#3. Store in a. csv filedefsave_file_csv (items):Importcsv Csv_file= Open ('Job.csv','W', newline="') Writer=Csv.writer (Csv_file) Writer.writerow (('Position name','Company Name','Company Address','Salary','Date'))     forIteminchItems:writer.writerow (item)#3. Store in Excel table#def save_file_excel (items):#newtable = ' Jobs.xls '#WB = XLWT. Workbook (encoding= ' utf-8 ') #创建excel文件#ws = Wb.add_sheet (' job ') #去创建表#headdata = [' Job name ', ' Company name ', ' Company address ', ' salary ', ' date ']#index = 1#For Colnum in range (5):#ws.write (0,COLNUM,HEADDATA[COLNUM],XLWT.EASYXF (' Font:bold on '))#For item in items:#For J in range (Len (item)):#Ws.write (Index,j,item[j])#Index + = 1#Wb.save (newtable)#3. Store in TXT file#def save_file_txt (items):#with open (' Job.txt ', ' W ') as F:#For item in items:#For J in range (Len (item)):#F.write (Item[j])#f.write (")#f.write (' \ n ')if __name__=='__main__':     forIinchRange (1,3): ---------------2 HTML=get_content (i) Items=get_data (HTML) save_file_csv (items)

Fourth-crawl HRM python related work

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.