Python pseudo-code crawl perfect volunteer National science and Arts bar code running codes continuously updated

Source: Internet
Author: User
Tags xpath

Recently a lot of small partners said that want to engage in a project to combat class, I took a little time to do a crawler project (when the code copy may be a bit of a problem, the indentation is no problem)
Want to get more source code or answer questions or exchange learning can add group: 725479218

#-*-Coding:utf-8-*-from function.data_tool import clean_dataimport hashlibimport furl.furlfrom Crawlers.downloader im Port Downloaderfromfunction.parse_tool Import xpath_parsefromfunction.database_tool Import auto_sqlsevedown= Downloader (proxy= ' http://104.224.138.224:8888/proxy ') a = {' Jilin ': ' 22 ', ' Hebei ': ' 13 ', ' Shaanxi ': ' 61 ', ' Shanxi ': ' 14 ', ' Qinghai ': ' 63 ', ' Hunan ' : ' 43 ', ' Guangdong ': ' 44 ', ' Anhui ': ' 34 ', ' Sichuan ': ' 51 ', ' Jiangxi ': ' 36 ', ' Zhejiang ': ' 33 ', ' Guizhou ': ' 52 ', ' Xinjiang ': ' 65 ', ' Inner Mongolia ': ' 15 ', ' Tibet ': ' 54 ', '  Jiangsu ': ' 32 ', ' Guangxi ': ' 45 ', ' Hubei ': ' 42 ', ' Hainan ': ' 46 ', ' Henan ': ' 41 ', ' Shandong ': ' 37 ', ' Fujian ': ' 35 ', ' Yunnan ': ' 53 ', ' Shanghai ': ' 31 ', ' Beijing ': ' 11 ',          ' Tianjin ': ' 12 ', ' Gansu ': ' 62 ', ' Ningxia ': ' 64 ', ' Heilongjiang ': ' 23 ', ' Chongqing ': ' 50 ', ' Liaoning ': '}for Province in b:for subject in C: Field_info=[] key_word=a[province] reform_url.args[' type ']=subject reform_url.args[' Provin Ce ']=key_word response=down.get (url=reform_url,typ= ' text ', encoding= ' utf-8 ') Htmlcode = eval (clean_data.c       Lean_space (response)) [' Htmlstr ']   xpath_html = Xpath_parse.text_tolxml (htmlcode) year = Xpath_html.xpath (' String (text ()) = "Acceptance Batch"] /..)‘).          Replace (' \ R ', '). replace (' \ t ', '). Replace (' admission lot ', '). Replace (",") Year_split = Year.split () Ben_yi = Xpath_html.xpath (' String (//td[normalize-space (text ()) = "Undergraduate first batch of"]/...) ').                                                                                                           Replace (' \ R ', '). replace (' \ t ', "). Replace (' The first batch of ', ' '). Replace (', ') Ben_yi_split = Ben_yi.split () Ben_er = XP Ath_html.xpath (' String ' (//td[normalize-space () = "Undergraduate second batch"]/...) ").                                                                                                           Replace (' \ R ', '). replace (' \ t ', "). Replace (' undergraduate second ', '). Replace (', ') Ben_er_split = Ben_er.split () b En_san = Xpath_html.xpath (' String (//td[normalize-space (text ()) = "Undergraduate third batch"]/...) ').  Replace (' \ R ', '). replace (' \ t ',                                                                                                          "). Replace ( ' Undergraduate third instalment ', '). Replace (', ') Ben_san_split = Ben_san.split () Zhuan_yi = Xpath_html.xpath (' St Ring (//td[normalize-space (text ()) = "Specialist First batch"]/...). Replace (' \ R ', '). replace (' \ t ', b = [' Anhui ', ' Beijing ', ' Chongqing ', ' Fujian ', ' Gansu ', ' Guizhou ', ' Guangdong ', ' Guangxi ', ' Hubei ', ' Hainan ', ' Heilongjiang ', ' Hunan ', ' Henan ', ' Hebei ', ' Jilin ', ' Jiangxi ', ' Jiangsu ', ' Liaoning ', ' Ningxia ', ' Inner Mongolia ', ' Qinghai ', ' Shanxi ', ' Shandong ', ' Shaanxi ', ' Sichuan ', ' Shanghai ', ' Tianjin ', ' Tibet ', ' Xinjiang ', ' Yunnan ', ' Zhejiang ']c=[' Wen ', ' Li ']url = ' https://www.wmzy.com/api/score/getScoreList?type=wen&province=33 ' reform_url=furl.furl (URL) w=auto_ Sqlsever.                                                                                                             Mssql (database= ' Provincescore ', datatable=[' scoreprovince ') "). Replace (' The first batch of ' specialties ', '). Replace (', ') zhuan_yi_s Plit = Zhuan_yi.split () zhuan_er = Xpath_html.xpath (' String (text ()) = "Specialist second batch"]/..)‘).                                                                                                             Replace (' \ R ', '). replace (' \ t ',          "). Replace (' The second batch of ' specialist ', '). Replace (', ') Zhuan_er_split = Zhuan_er.split () If ' Wen ' in subject:subject= ' liberal arts ' else:subject= ' science ' Print (zhuan_yi_split,zhuan_er_spli T,ben_san_split,ben_er_split,ben_yi_split) Provincemd5=[hashlib.md5 (Province.encode ()). Hexdigest ()]*8 ti Qian=[0]*8 Field_info.extend ([[Province]*8,provincemd5,year_split,[subject]*8,tiqian,ben_yi_split,ben_er_split, Ben_san_split,zhuan_yi_split,zhuan_er_split]) W.insert_data (field_info)

Python pseudo-code crawl perfect volunteer National science and Arts bar code running codes Continuous update

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.