Objective: To obtain the name and transaction information of all shares of SSE and Szse, and to keep them in the file.
Technology used: Requests+bs4+re
Site selection (Selection principle: the static existence of the Stock information HTML page, non-JS code generation no Yo Robot protocol restrictions)
1. Get stock list: http://quote.eastmoney.com/stocklist.html (because the East Net station has a list of all stock information, Baidu Stock website as long as the share information)
2. Get stock Information:
Baidu Stock: https://gupiao.baidu.com/stock/
Single Stock: https://gupiao.baidu.com/stock/sz002939.html
The design structure of the program:
Step 1: Get the list of stocks from Oriental wealth
Step 2: According to the stock list to the Baidu stock to obtain a share information
Step 3: Save the results to a file
"Step 1"
To obtain the East NET station stock list information by sending a request, view the page source code as follows:
The stock code is found to be stored in the href attribute of <a>, and the stock code submitted to and from Shenzhen is "sh" and "sz" respectively, which can then be parsed and matched by this rule.
First, use BEAUTIFULSOUP4 to get all the <a>
Soup = beautifulsoup (html, ' Html.parser ')
A = Soup.find_all (' a ')
The stock code extracted with the regular expression is then stored in the LST list:
For I in A:
Try
href = i.attrs[' href ']
Lst.append (Re.findall (R "[S][hz]\d{6}", href) [0])
Except
Continue
At this time the list lst = [' sh201000 ', ' sh201001 ', ' sh201002 ' ...]
"Step 2"
Next, according to the list of stock codes obtained, one in the Baidu stock to obtain a stock information.
Url:https://gupiao.baidu.com/stock/sz002939.html of stock information of Baidu share
Therefore, the URL is stitched first and then the request Fetch page is sent
For-Stock in LST:
url = ' https://gupiao.baidu.com/stock/' + stock + ". html"
html = gethtmltext (URL)
Then page parsing, viewing the source code
Discover all of the stock information in the <dt><dd>, then use BeautifulSoup to step-by-step analysis
Soup = beautifulsoup (html, ' Html.parser ')
Stockinfo = Soup.find (' div ', attrs={' class ': ' Stock-bets '})
If Stockinfo:
Name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0]
Infodict.update ({' Stock name ': Name.text.split () [0]})
Else
Print (' Stockinfo is null ')
Break
Keylist = Stockinfo.find_all (' DT ')
ValueList = Stockinfo.find_all (' dd ')
For I in range (len (keylist)):
Key = Keylist[i].text
val = Valuelist[i].text
Infodict[key] = val
At this time, infodict = {"Volume": "310,700 Lot", "Maximum": "9.89", "Limit": "10.86" ...}
"Step 3"
Finally, output the results to a file:
With open (Fpath, ' a ', encoding= ' utf-8 ') as F:
F.write (str (infodict) + ' \ n ')
The complete code is as follows:
#crawbaidustocksa.pyImportRequests fromBs4ImportBeautifulSoupImportTracebackImportRe#get a public method of a pagedefgethtmltext (URL):Try: R=requests.get (URL) r.raise_for_status () r.encoding=r.apparent_encodingreturnR.textexcept: return "Get fail"#get a list of stock codesdefgetstocklist (LST, stockurl): HTML=gethtmltext (stockurl) Soup= BeautifulSoup (HTML,'Html.parser') A= Soup.find_all ('a') forIinchA:Try: href= i.attrs['href'] Lst.append (Re.findall (R"[S][hz]\d{6}", href) [0])except: Continue #get stock information and output it to a filedefGetstockinfo (LST, Stockurl, Fpath): forStockinchLst:url= Stockurl + stock +". html"HTML=gethtmltext (URL)Try: ifhtml=="": Continueinfodict={} soup= BeautifulSoup (HTML,'Html.parser') Stockinfo= Soup.find ('Div', attrs={'class':'stock-bets'}) ifStockinfo:name= Stockinfo.find_all (attrs={'class':'Bets-name'}) [0] infodict.update ({'Stock name': Name.text.split () [0]}) Else: Print('stockinfo is null') Breakkeylist= Stockinfo.find_all ('DT') ValueList= Stockinfo.find_all ('DD') forIinchRange (len (keylist)): Key=Keylist[i].text Val=Valuelist[i].text Infodict[key]=Val with open (Fpath,'a', encoding='Utf-8') as F:f.write (str (infodict)+'\ n' ) except: Traceback.print_exc ()Continue defMain (): Stock_list_url='http://quote.eastmoney.com/stocklist.html' #east of the Wealth stock listStock_info_url ='https://gupiao.baidu.com/stock/' #Baidu Stock InformationOutput_file ='D:/baidustockinfo.txt' #the resulting stored fileslist=[] getstocklist (Slist, Stock_list_url) getstockinfo (Slist, Stock_info_url, output_file) main ()
Self-learning Python crawler 3 stock Data Crawler