Modified the song day teacher code part of the content:
① will be in the URL of HTTPS to HTTP, do not modify the program can not crawl;
② preferably in the Getstockinfo () function, return when Count >= is 50, because up to now there are 4717 (you can add a print (Len (slist) after Getstocklist () in the main () function). ) A stock of stock ... Running the program takes a long time, I ran two hours only 50%, this number seems to expand 100 times times ...
Code:
Import requests from BS4 import beautifulsoup import traceback import re def gethtmltext (URL, code= "Utf-8"): Try:
r = Requests.get (URL) r.raise_for_status () r.encoding = code return R.text except: Return "" Def getstocklist (LST, stockurl): HTML = Gethtmltext (Stockurl, "GB2312") soup = beautifulsoup (HTML, ' Html.parser ') a = Soup.find_all (' A ') for i in a:try:href = i.attrs[' href '] lst . Append (Re.findall (r "[S][hz]\d{6}", href) [0]) except:continue def getstockinfo (LST, Stockurl, Fpath ): Count = 0 for the Lst:url = stockurl + stock + ". html" html = gethtmltext (URL) tr Y:if html== "": Continue infodict = {} soup = BeautifulSoup (html, ' htm L.parser ') Stockinfo = Soup.find (' div ', attrs={' class ': ' Stock-bets '}) name = Stockinfo.find_all (A Ttrs={' class ': ' Bets-name'}] [0] infodict.update ({' Stock name ': Name.text.split () [0]}) keylist = Stockinfo.find_all ( ' dt ') ValueList = Stockinfo.find_all (' dd ') for I in Range (len (keylist)): Key = key List[i].text val = valuelist[i].text Infodict[key] = Val with op
En (Fpath, ' a ', encoding= ' Utf-8 ') as F:f.write (str (infodict) + ' \ n ') Count = count + 1
Print ("\ r Current progress: {:. 2f}%". Format (Count*100/len (LST)), end= "") Except:count = count + 1 Print ("\ R) Current progress: {:. 2f}%". Format (Count*100/len (LST), end= "") Continue def Main (): Stock_list_url = ' http://quote.eastmoney.com/stocklist.html ' Stock_info_url = ' http://gupiao.baidu.com/stock/' output_file = ' f:/b AiduStockInfo.txt ' slist=[] getstocklist (slist, Stock_list_url) getstockinfo (Slist, Stock_info_url, Output_fil e) Main ()