Summer vacation a person in the dormitory, idle come to nothing.
One day, the counselor just sent the school year to check the link, a look, found to be easy to find sub-platform, coupled with the data at hand, there are dead end ah, then began to design reptiles.
Easy to check this site is strange, the PC version needs to enter a verification code, mobile phone version is not required. In order to facilitate crawling, decisive choice mobile phone version. (originally also want to train an auto-fill verification code of the neural network, can be a bit more difficult, have empty later to fill it up)
The crawler is implemented using Selenium's Webdriver technology. Speed...... You can only say it's acceptable.
Data preparation: Need to check the students ' name, student number and ID number. (Specific access Method ...) Self-addressed)
This code also incorporates the reading and writing of Excel.
Reptiles are risky, please obey the laws and regulations Oh!
ImportxlrdImportXLWT fromSeleniumImportWebdriverImportTimeallstu= []classStu ():def __init__(self, name, sex, number, PSW): Self.name=name Self.sex=Sex Self.number=Number SELF.PSW= Psw[-7:-1] Self.dic={} self.classify="'defReadData ():GlobalAllstu Workbook= Xlrd.open_workbook ('data.xlsx') Booksheet=workbook.sheet_by_index (0) Col=Booksheet.ncols Row=booksheet.nrowsPrint(Row, col) forIinchrange (Row): Allstu.append (Stu (Booksheet.cell_value (i, 0), Booksheet.cell_value (i,1), Booksheet.cell_value (i,3), Booksheet.cell_value (I, 2)))defWriteData (): book= XLWT. Workbook (encoding='Utf-8', style_compression=0) Sheet= Book.add_sheet (' out', cell_overwrite_ok=True) forJinchRange (len (allsubjects)): Sheet.write (0,4 +J, Allsubjects[j]) forIinchRange (len (allstu)): Sheet.write (i+ 1, 0, Allstu[i].name) sheet.write (i+ 1, 1, Allstu[i].sex) sheet.write (i+ 1, 2, Allstu[i].number) sheet.write (i+ 1, 3, Allstu[i].classify) forJinchRange (len (allsubjects)): Sheet.write (i+ 1, 4 + J, Allstu[i].dic.get (Allsubjects[j],"')) Book.save (R'Out.xls') allsubjects=[]readdata () URLs= ['http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMjkwfDViN2EzZDI0NTllYzAO0O0O.html', 'http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMzAxfDViN2E2MGQwNTVkM2UO0O0O.html', 'http://241374.yichafen.com/mobile/queryscore/sqcode/MsTcInwmMzAyfDViN2E2MTVhY2E2MDQO0O0O.html']classes= ['Communication Engineering','Network Engineering','Internet of Things engineering']driver=Webdriver. Chrome ()#i = 0i = 15 while(I <Len (allstu)):#While (i <): #Time.sleep (0.5)Found =False forKinchRange (3): URL=Urls[k] Driver.implicitly_wait (1) driver.get (URL) driver.refresh () number= Driver.find_element_by_xpath ("//input[@name = ' S_xuehao ')") Number.clear () Number.send_keys (allstu[i].number) name= Driver.find_element_by_xpath ("//input[@name = ' s_xingming ')") Name.clear () Name.send_keys (allstu[i].name) PSW= Driver.find_element_by_xpath ("//input[@name = ' s_2c54d23b18177aabe8759f1f551451f3 ')") Psw.clear () Psw.send_keys (ALLSTU[I].PSW) button= Driver.find_element_by_xpath ("//a[@id = ' submitbtn ')") Button.Click () flag=FalseTry: Driver.implicitly_wait (0.5) ErrorMsg= Driver.find_element_by_xpath ("//div[@class = ' weui-dialog__bd ')") #print (Errormsg.text) except: Flag=Trueifflag:allstu[i].classify=Classes[k] found=True subnames= Driver.find_elements_by_class_name ('Left_cell') Grades= Driver.find_elements_by_class_name ('Right_cell') forJinchRange (3, Len (subnames)):if notSubnames[j].textinchallsubjects:allsubjects.append (subnames[j].text) Allstu[i].dic[subnames[j].text] /c4>=Grades[j].text Break Print('{} {}: {}, finished'. Format (str (i + 1), Allstu[i].name, allstu[i].classify)) I+ = 1WriteData ()
The batch crawl results of the website crawler of the Yi-cha