The batch crawl results of the website crawler of the Yi-cha

Source: Internet
Author: User

Summer vacation a person in the dormitory, idle come to nothing.

One day, the counselor just sent the school year to check the link, a look, found to be easy to find sub-platform, coupled with the data at hand, there are dead end ah, then began to design reptiles.
Easy to check this site is strange, the PC version needs to enter a verification code, mobile phone version is not required. In order to facilitate crawling, decisive choice mobile phone version. (originally also want to train an auto-fill verification code of the neural network, can be a bit more difficult, have empty later to fill it up)

The crawler is implemented using Selenium's Webdriver technology. Speed...... You can only say it's acceptable.

Data preparation: Need to check the students ' name, student number and ID number. (Specific access Method ...) Self-addressed)

This code also incorporates the reading and writing of Excel.

Reptiles are risky, please obey the laws and regulations Oh!

ImportxlrdImportXLWT fromSeleniumImportWebdriverImportTimeallstu= []classStu ():def __init__(self, name, sex, number, PSW): Self.name=name Self.sex=Sex Self.number=Number SELF.PSW= Psw[-7:-1] Self.dic={} self.classify="'defReadData ():GlobalAllstu Workbook= Xlrd.open_workbook ('data.xlsx') Booksheet=workbook.sheet_by_index (0) Col=Booksheet.ncols Row=booksheet.nrowsPrint(Row, col) forIinchrange (Row): Allstu.append (Stu (Booksheet.cell_value (i, 0), Booksheet.cell_value (i,1), Booksheet.cell_value (i,3), Booksheet.cell_value (I, 2)))defWriteData (): book= XLWT. Workbook (encoding='Utf-8', style_compression=0) Sheet= Book.add_sheet (' out', cell_overwrite_ok=True) forJinchRange (len (allsubjects)): Sheet.write (0,4 +J, Allsubjects[j]) forIinchRange (len (allstu)): Sheet.write (i+ 1, 0, Allstu[i].name) sheet.write (i+ 1, 1, Allstu[i].sex) sheet.write (i+ 1, 2, Allstu[i].number) sheet.write (i+ 1, 3, Allstu[i].classify) forJinchRange (len (allsubjects)): Sheet.write (i+ 1, 4 + J, Allstu[i].dic.get (Allsubjects[j],"')) Book.save (R'Out.xls') allsubjects=[]readdata () URLs= ['http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMjkwfDViN2EzZDI0NTllYzAO0O0O.html',        'http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMzAxfDViN2E2MGQwNTVkM2UO0O0O.html',        'http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMzAyfDViN2E2MTVhY2E2MDQO0O0O.html']classes= ['Communication Engineering','Network Engineering','Internet of Things engineering']driver=Webdriver. Chrome ()#i = 0i = 15 while(I <Len (allstu)):#While (i <):    #Time.sleep (0.5)Found =False forKinchRange (3): URL=Urls[k] Driver.implicitly_wait (1) driver.get (URL) driver.refresh () number= Driver.find_element_by_xpath ("//input[@name = ' S_xuehao ')") Number.clear () Number.send_keys (allstu[i].number) name= Driver.find_element_by_xpath ("//input[@name = ' s_xingming ')") Name.clear () Name.send_keys (allstu[i].name) PSW= Driver.find_element_by_xpath ("//input[@name = ' s_2c54d23b18177aabe8759f1f551451f3 ')") Psw.clear () Psw.send_keys (ALLSTU[I].PSW) button= Driver.find_element_by_xpath ("//a[@id = ' submitbtn ')") Button.Click () flag=FalseTry: Driver.implicitly_wait (0.5) ErrorMsg= Driver.find_element_by_xpath ("//div[@class = ' weui-dialog__bd ')")        #print (Errormsg.text)        except: Flag=Trueifflag:allstu[i].classify=Classes[k] found=True subnames= Driver.find_elements_by_class_name ('Left_cell') Grades= Driver.find_elements_by_class_name ('Right_cell')             forJinchRange (3, Len (subnames)):if  notSubnames[j].textinchallsubjects:allsubjects.append (subnames[j].text) Allstu[i].dic[subnames[j].text] /c4>=Grades[j].text Break    Print('{} {}: {}, finished'. Format (str (i + 1), Allstu[i].name, allstu[i].classify)) I+ = 1WriteData ()

The batch crawl results of the website crawler of the Yi-cha

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.