Self-learning Python crawler 3 stock Data Crawler

Last Update:2017-09-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective: To obtain the name and transaction information of all shares of SSE and Szse, and to keep them in the file.

Technology used: Requests+bs4+re

Site selection (Selection principle: the static existence of the Stock information HTML page, non-JS code generation no Yo Robot protocol restrictions)

1. Get stock list: http://quote.eastmoney.com/stocklist.html (because the East Net station has a list of all stock information, Baidu Stock website as long as the share information)

2. Get stock Information:

Baidu Stock: https://gupiao.baidu.com/stock/

Single Stock: https://gupiao.baidu.com/stock/sz002939.html

The design structure of the program:

Step 1: Get the list of stocks from Oriental wealth

Step 2: According to the stock list to the Baidu stock to obtain a share information

Step 3: Save the results to a file

"Step 1"

To obtain the East NET station stock list information by sending a request, view the page source code as follows:

The stock code is found to be stored in the href attribute of <a>, and the stock code submitted to and from Shenzhen is "sh" and "sz" respectively, which can then be parsed and matched by this rule.

First, use BEAUTIFULSOUP4 to get all the <a>

Soup = beautifulsoup (html, ' Html.parser ')
A = Soup.find_all (' a ')

The stock code extracted with the regular expression is then stored in the LST list:

For I in A:
Try
href = i.attrs[' href ']
Lst.append (Re.findall (R "[S][hz]\d{6}", href) [0])
Except
Continue

At this time the list lst = [' sh201000 ', ' sh201001 ', ' sh201002 ' ...]

"Step 2"

Next, according to the list of stock codes obtained, one in the Baidu stock to obtain a stock information.

Url:https://gupiao.baidu.com/stock/sz002939.html of stock information of Baidu share

Therefore, the URL is stitched first and then the request Fetch page is sent

For-Stock in LST:

url = ' https://gupiao.baidu.com/stock/' + stock + ". html"

html = gethtmltext (URL)

Then page parsing, viewing the source code

Discover all of the stock information in the <dt><dd>, then use BeautifulSoup to step-by-step analysis

Soup = beautifulsoup (html, ' Html.parser ')

Stockinfo = Soup.find (' div ', attrs={' class ': ' Stock-bets '})

If Stockinfo:
Name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0]
Infodict.update ({' Stock name ': Name.text.split () [0]})

Else
Print (' Stockinfo is null ')
Break
Keylist = Stockinfo.find_all (' DT ')
ValueList = Stockinfo.find_all (' dd ')

For I in range (len (keylist)):
Key = Keylist[i].text
val = Valuelist[i].text
Infodict[key] = val

At this time, infodict = {"Volume": "310,700 Lot", "Maximum": "9.89", "Limit": "10.86" ...}

"Step 3"

Finally, output the results to a file:

With open (Fpath, ' a ', encoding= ' utf-8 ') as F:
F.write (str (infodict) + ' \ n ')

The complete code is as follows:

#crawbaidustocksa.pyImportRequests fromBs4ImportBeautifulSoupImportTracebackImportRe#get a public method of a pagedefgethtmltext (URL):Try: R=requests.get (URL) r.raise_for_status () r.encoding=r.apparent_encodingreturnR.textexcept:        return "Get fail"#get a list of stock codesdefgetstocklist (LST, stockurl): HTML=gethtmltext (stockurl) Soup= BeautifulSoup (HTML,'Html.parser') A= Soup.find_all ('a')     forIinchA:Try: href= i.attrs['href'] Lst.append (Re.findall (R"[S][hz]\d{6}", href) [0])except:            Continue #get stock information and output it to a filedefGetstockinfo (LST, Stockurl, Fpath): forStockinchLst:url= Stockurl + stock +". html"HTML=gethtmltext (URL)Try:            ifhtml=="":                Continueinfodict={} soup= BeautifulSoup (HTML,'Html.parser') Stockinfo= Soup.find ('Div', attrs={'class':'stock-bets'})            ifStockinfo:name= Stockinfo.find_all (attrs={'class':'Bets-name'}) [0] infodict.update ({'Stock name': Name.text.split () [0]}) Else:                Print('stockinfo is null')                 Breakkeylist= Stockinfo.find_all ('DT') ValueList= Stockinfo.find_all ('DD')             forIinchRange (len (keylist)): Key=Keylist[i].text Val=Valuelist[i].text Infodict[key]=Val with open (Fpath,'a', encoding='Utf-8') as F:f.write (str (infodict)+'\ n' )        except: Traceback.print_exc ()Continue defMain (): Stock_list_url='http://quote.eastmoney.com/stocklist.html' #east of the Wealth stock listStock_info_url ='https://gupiao.baidu.com/stock/' #Baidu Stock InformationOutput_file ='D:/baidustockinfo.txt' #the resulting stored fileslist=[] getstocklist (Slist, Stock_list_url) getstockinfo (Slist, Stock_info_url, output_file) main ()

Self-learning Python crawler 3 stock Data Crawler

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Self-learning Python crawler 3 stock Data Crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Self-learning Python crawler 3 stock Data Crawler

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support