Recently a lot of small partners said that want to engage in a project to combat class, I took a little time to do a crawler project (when the code copy may be a bit of a problem, the indentation is no problem)
Want to get more source code or answer questions or exchange learning can add group: 725479218
#-*-Coding:utf-8-*-from function.data_tool import clean_dataimport hashlibimport furl.furlfrom Crawlers.downloader im Port Downloaderfromfunction.parse_tool Import xpath_parsefromfunction.database_tool Import auto_sqlsevedown= Downloader (proxy= ' http://104.224.138.224:8888/proxy ') a = {' Jilin ': ' 22 ', ' Hebei ': ' 13 ', ' Shaanxi ': ' 61 ', ' Shanxi ': ' 14 ', ' Qinghai ': ' 63 ', ' Hunan ' : ' 43 ', ' Guangdong ': ' 44 ', ' Anhui ': ' 34 ', ' Sichuan ': ' 51 ', ' Jiangxi ': ' 36 ', ' Zhejiang ': ' 33 ', ' Guizhou ': ' 52 ', ' Xinjiang ': ' 65 ', ' Inner Mongolia ': ' 15 ', ' Tibet ': ' 54 ', ' Jiangsu ': ' 32 ', ' Guangxi ': ' 45 ', ' Hubei ': ' 42 ', ' Hainan ': ' 46 ', ' Henan ': ' 41 ', ' Shandong ': ' 37 ', ' Fujian ': ' 35 ', ' Yunnan ': ' 53 ', ' Shanghai ': ' 31 ', ' Beijing ': ' 11 ', ' Tianjin ': ' 12 ', ' Gansu ': ' 62 ', ' Ningxia ': ' 64 ', ' Heilongjiang ': ' 23 ', ' Chongqing ': ' 50 ', ' Liaoning ': '}for Province in b:for subject in C: Field_info=[] key_word=a[province] reform_url.args[' type ']=subject reform_url.args[' Provin Ce ']=key_word response=down.get (url=reform_url,typ= ' text ', encoding= ' utf-8 ') Htmlcode = eval (clean_data.c Lean_space (response)) [' Htmlstr '] xpath_html = Xpath_parse.text_tolxml (htmlcode) year = Xpath_html.xpath (' String (text ()) = "Acceptance Batch"] /..)‘). Replace (' \ R ', '). replace (' \ t ', '). Replace (' admission lot ', '). Replace (",") Year_split = Year.split () Ben_yi = Xpath_html.xpath (' String (//td[normalize-space (text ()) = "Undergraduate first batch of"]/...) '). Replace (' \ R ', '). replace (' \ t ', "). Replace (' The first batch of ', ' '). Replace (', ') Ben_yi_split = Ben_yi.split () Ben_er = XP Ath_html.xpath (' String ' (//td[normalize-space () = "Undergraduate second batch"]/...) "). Replace (' \ R ', '). replace (' \ t ', "). Replace (' undergraduate second ', '). Replace (', ') Ben_er_split = Ben_er.split () b En_san = Xpath_html.xpath (' String (//td[normalize-space (text ()) = "Undergraduate third batch"]/...) '). Replace (' \ R ', '). replace (' \ t ', "). Replace ( ' Undergraduate third instalment ', '). Replace (', ') Ben_san_split = Ben_san.split () Zhuan_yi = Xpath_html.xpath (' St Ring (//td[normalize-space (text ()) = "Specialist First batch"]/...). Replace (' \ R ', '). replace (' \ t ', b = [' Anhui ', ' Beijing ', ' Chongqing ', ' Fujian ', ' Gansu ', ' Guizhou ', ' Guangdong ', ' Guangxi ', ' Hubei ', ' Hainan ', ' Heilongjiang ', ' Hunan ', ' Henan ', ' Hebei ', ' Jilin ', ' Jiangxi ', ' Jiangsu ', ' Liaoning ', ' Ningxia ', ' Inner Mongolia ', ' Qinghai ', ' Shanxi ', ' Shandong ', ' Shaanxi ', ' Sichuan ', ' Shanghai ', ' Tianjin ', ' Tibet ', ' Xinjiang ', ' Yunnan ', ' Zhejiang ']c=[' Wen ', ' Li ']url = ' https://www.wmzy.com/api/score/getScoreList?type=wen&province=33 ' reform_url=furl.furl (URL) w=auto_ Sqlsever. Mssql (database= ' Provincescore ', datatable=[' scoreprovince ') "). Replace (' The first batch of ' specialties ', '). Replace (', ') zhuan_yi_s Plit = Zhuan_yi.split () zhuan_er = Xpath_html.xpath (' String (text ()) = "Specialist second batch"]/..)‘). Replace (' \ R ', '). replace (' \ t ', "). Replace (' The second batch of ' specialist ', '). Replace (', ') Zhuan_er_split = Zhuan_er.split () If ' Wen ' in subject:subject= ' liberal arts ' else:subject= ' science ' Print (zhuan_yi_split,zhuan_er_spli T,ben_san_split,ben_er_split,ben_yi_split) Provincemd5=[hashlib.md5 (Province.encode ()). Hexdigest ()]*8 ti Qian=[0]*8 Field_info.extend ([[Province]*8,provincemd5,year_split,[subject]*8,tiqian,ben_yi_split,ben_er_split, Ben_san_split,zhuan_yi_split,zhuan_er_split]) W.insert_data (field_info)
Python pseudo-code crawl perfect volunteer National science and Arts bar code running codes Continuous update