Crawl the ' Best University Network ', extracting the names and scores of the top 20 universities in 2017
1 #Coding:utf-82 ImportRequests3 fromBs4ImportBeautifulSoup4 ImportBS45 6 defgethtmltext (URL):7 Try:8r = Requests.get (URL, timeout=30)9 r.raise_for_status ()TenR.encoding =r.apparent_encoding One returnR.text A except: - return "fail" - the deffillunivlist (ulist, HTML): -Soup = BeautifulSoup (HTML,"Html.parser") - forTrinchSoup.find ('tbody'). Children: - ifisinstance (TR, bs4.element.Tag): +TDS = TR ('TD') -Ulist.append ([Tds[0].string, Tds[1].string, tds[3].string]) + A defprintunivlist (ulist, num): atTPLT ="{0:^10}\t{1:{3}^10}\t{2:^10}" - forIinchrange (num): -u=Ulist[i] - Print(u[1],u[2]) - - defMain (): inUinfo = [] -URL ='http://www.zuihaodaxue.cn/zuihaodaxuepaiming2017.html' toHTML =gethtmltext (URL) + fillunivlist (uinfo, HTML) -Printunivlist (Uinfo, 20) the *Main ()
Results:
Python self-taught 2--crawler