ImportRe fromUrllib.requestImportUrlopendefgetpage (URL): Response=urlopen (URL)returnResponse.read (). Decode ('GBK', errors='Ignore')defParsepage (s): COM=re.compile (R'<td height= ">.*?<b>.*?<a href=" (? P<url_name>.*?) " class= "Ulink" >.*?', Re. S) ret=Com.finditer (s) forIinchret:return "http://www.dytt8.net"+i.group ("Url_name")defParsePage1 (s): COM=re.compile (R'<div id= "Zoom" >.* translation. * Name (? P<name>.*?) <br/> Tablets. * Name (? P<pianname>.*?) <BR/>.*? Guide. * Play (?) P<daoyan>.*?) <br/>'+'The Lord. * Play (? P<zhuyan>.*?) <br/><br/> Jane. *. *?<td.*?><a href= "(? P<xiazaidizhi>.*?) " >', Re. S) Ret1=Com.finditer (s)#print (' **************************************************************** ') forIinchRet1:yield{"yiming":(Re.sub ("[\u3000]","", I.group ('name'))), "pianming": Re.sub ("[\u3000]","", I.group ("Pianname")), "Daoyan": Re.sub ("[\u3000]","", I.group ("Daoyan")), "Zhuyan": Re.sub ("[\u3000]","", I.group ("Zhuyan")), "Xiazaidizhi": Re.sub ("[\u3000]","", I.group ("Xiazaidizhi"))}defMain (num): URL="http://www.dytt8.net/html/gndy/dyzz/list_23_%s.html"%Num response_html=getpage (URL) xiangqing=parsepage (response_html) response1_html=GetPage (xiangqing) RET=ParsePage1 (response1_html) F= Open ("move_list","a", encoding="UTF8") forObjinchret:Print(obj) data=str (obj) f.write (Data+"\ n") forIinchRange (1,181): Main (i)
Reptile first Knowledge (crawl Dytt movie list and)