Self-learning Python crawler 3 stock Data Crawler

Source: Internet
Author: User

Objective: To obtain the name and transaction information of all shares of SSE and Szse, and to keep them in the file.

Technology used: Requests+bs4+re

Site selection (Selection principle: the static existence of the Stock information HTML page, non-JS code generation no Yo Robot protocol restrictions)

1. Get stock list: http://quote.eastmoney.com/stocklist.html (because the East Net station has a list of all stock information, Baidu Stock website as long as the share information)

2. Get stock Information:

Baidu Stock: https://gupiao.baidu.com/stock/

Single Stock: https://gupiao.baidu.com/stock/sz002939.html

The design structure of the program:

Step 1: Get the list of stocks from Oriental wealth

Step 2: According to the stock list to the Baidu stock to obtain a share information

Step 3: Save the results to a file

"Step 1"

To obtain the East NET station stock list information by sending a request, view the page source code as follows:

The stock code is found to be stored in the href attribute of <a>, and the stock code submitted to and from Shenzhen is "sh" and "sz" respectively, which can then be parsed and matched by this rule.

First, use BEAUTIFULSOUP4 to get all the <a>

Soup = beautifulsoup (html, ' Html.parser ')
A = Soup.find_all (' a ')

The stock code extracted with the regular expression is then stored in the LST list:

For I in A:
Try
href = i.attrs[' href ']
Lst.append (Re.findall (R "[S][hz]\d{6}", href) [0])
Except
Continue

At this time the list lst = [' sh201000 ', ' sh201001 ', ' sh201002 ' ...]

"Step 2"

Next, according to the list of stock codes obtained, one in the Baidu stock to obtain a stock information.

Url:https://gupiao.baidu.com/stock/sz002939.html of stock information of Baidu share

Therefore, the URL is stitched first and then the request Fetch page is sent

For-Stock in LST:

url = ' https://gupiao.baidu.com/stock/' + stock + ". html"

html = gethtmltext (URL)

Then page parsing, viewing the source code

Discover all of the stock information in the <dt><dd>, then use BeautifulSoup to step-by-step analysis

Soup = beautifulsoup (html, ' Html.parser ')

Stockinfo = Soup.find (' div ', attrs={' class ': ' Stock-bets '})

If Stockinfo:
Name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0]
Infodict.update ({' Stock name ': Name.text.split () [0]})

Else
Print (' Stockinfo is null ')
Break
Keylist = Stockinfo.find_all (' DT ')
ValueList = Stockinfo.find_all (' dd ')

For I in range (len (keylist)):
Key = Keylist[i].text
val = Valuelist[i].text
Infodict[key] = val

At this time, infodict = {"Volume": "310,700 Lot", "Maximum": "9.89", "Limit": "10.86" ...}

"Step 3"

Finally, output the results to a file:

With open (Fpath, ' a ', encoding= ' utf-8 ') as F:
F.write (str (infodict) + ' \ n ')

The complete code is as follows:

#crawbaidustocksa.pyImportRequests fromBs4ImportBeautifulSoupImportTracebackImportRe#get a public method of a pagedefgethtmltext (URL):Try: R=requests.get (URL) r.raise_for_status () r.encoding=r.apparent_encodingreturnR.textexcept:        return "Get fail"#get a list of stock codesdefgetstocklist (LST, stockurl): HTML=gethtmltext (stockurl) Soup= BeautifulSoup (HTML,'Html.parser') A= Soup.find_all ('a')     forIinchA:Try: href= i.attrs['href'] Lst.append (Re.findall (R"[S][hz]\d{6}", href) [0])except:            Continue #get stock information and output it to a filedefGetstockinfo (LST, Stockurl, Fpath): forStockinchLst:url= Stockurl + stock +". html"HTML=gethtmltext (URL)Try:            ifhtml=="":                Continueinfodict={} soup= BeautifulSoup (HTML,'Html.parser') Stockinfo= Soup.find ('Div', attrs={'class':'stock-bets'})            ifStockinfo:name= Stockinfo.find_all (attrs={'class':'Bets-name'}) [0] infodict.update ({'Stock name': Name.text.split () [0]}) Else:                Print('stockinfo is null')                 Breakkeylist= Stockinfo.find_all ('DT') ValueList= Stockinfo.find_all ('DD')             forIinchRange (len (keylist)): Key=Keylist[i].text Val=Valuelist[i].text Infodict[key]=Val with open (Fpath,'a', encoding='Utf-8') as F:f.write (str (infodict)+'\ n' )        except: Traceback.print_exc ()Continue defMain (): Stock_list_url='http://quote.eastmoney.com/stocklist.html' #east of the Wealth stock listStock_info_url ='https://gupiao.baidu.com/stock/' #Baidu Stock InformationOutput_file ='D:/baidustockinfo.txt' #the resulting stored fileslist=[] getstocklist (Slist, Stock_list_url) getstockinfo (Slist, Stock_info_url, output_file) main ()

Self-learning Python crawler 3 stock Data Crawler

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.