Python Crawler crawls stock information

Last Update:2017-11-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The project requirement in our group is to grab stock information online and then make a visual interface. The first idea was to take advantage of Java crawling, but because Java code was a bit redundant, it decided to use Python. At the beginning of the project, there was a huge problem, because the team members have limited knowledge of Python, we decided to start by self-study and then write a simple crawler.

But there was a problem at the very beginning, and we initially assumed that we were grabbing stock information on East Net. After I climbed to the Web page information, I found that all the stock information is loaded in the page after loading, which makes the beginner Python I have a great deal of confusion, but the time is limited, I did not carry out systematic study. So, I can only find another way to seek help on the Internet. Finally, I found that I can crawl through the East net stock code information and then put the stock code in the Baidu stock market through the query, Baidu stock market through the stock information is loaded on the page directly loaded onto the page. Finding a way to solve the problem, I started to complete my code.

1 #Author:kevin Sun2 #Shanghai Stock Information3 4 ImportRequests5  fromBs4ImportBeautifulSoup6 ImportTraceback7 ImportRe8 Import Time9 Ten  One defGethtmltext (URL):#get the Web page source code you need A     Try: -User_agent ='mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36' -headers = {'user-agent': User_agent} ther = Requests.get (URL, headers=headers, timeout=30) - r.raise_for_status () -R.encoding =r.apparent_encoding -         returnR.text +     except: -         return "" +      A defGetFileName (): atDirName = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) -dirname+='SH' -     returndirname -      -      -  in defGetstocklist (LST, Stock_list_url):#get all the code for East net above Shanghai stock -HTML =Gethtmltext (Stock_list_url) toSoup = BeautifulSoup (HTML,'Html.parser') +A = Soup.find_all ('a')#use the Find_all method to traverse all ' a ' tags and extract the href part information from the ' A ' tab. -      forIinchA: the         Try: *href = i.attrs['href'] $Lst.append (Re.findall (R"sh6\d{5}", href) [0])#match the required information with a regular expression, "sh\d{6}"Panax Notoginseng             #print (LST) -         except: the             Continue +  A  the defGetstockinfo (LST, Stock_info_url, Fpath): +Ndate = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) -      forStockinchLST: $url = stock_info_url + stock +'. html' #Stitching URL $HTML =gethtmltext (URL) -         Try: -             ifHTML = ="": the                 Continue -Infodict = {}WuyiSoup = BeautifulSoup (HTML,'Html.parser') theStockinfo = Soup.find ('Div', attrs={'class':'stock-bets'}) -             ifStockinfo = = None:#The judgment is empty and returns Wu                 Continue -             #print (stockinfo) About             #name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0] $             #print (name) -             #infodict.update ({' Stock code ': stock}) -             #inp=name.text.split () [0]+ ":" -Keylist = Stockinfo.find_all ('DT') AValueList = Stockinfo.find_all ('DD') +inp=stock+","+ndate+"," the              forIinchRange (len (keylist)): -Key =Keylist[i].text $val =Valuelist[i].text theInfodict[key] =Val the             #print (INP) theinp+=infodict['Highest']+","+infodict['rate of change of hand']+","+infodict['Volume']+","+infodict['turnover']+"\ n" the             Print(INP) -With open (Fpath,'a', encoding='Utf-8') as F: in                  the                 #f.write (str (infodict) + ' \ n ') the f.write (INP) About         except: the Traceback.print_exc () the             Continue the  +  -  the defMain (): # The function above the main method callBayiStock_list_url ='http://quote.eastmoney.com/stocklist.html' theStock_info_url ='http://gupiao.baidu.com/stock/' theOutput_file ='./'+getfilename () +'. txt' -Slist = [] - getstocklist (slist, Stock_list_url) the Getstockinfo (Slist, Stock_info_url, output_file) the  the  theMain ()

The code used in Python is very convenient BeautifulSoup library (this library I do not explain more here, I refer to a Daniel's understanding, Link: http://cuiqingcai.com/1319.html), the crawler is using Google browser. In the code I also gave some comments, because it is a Python beginner, some places inevitably have errors, after the end of the project will continue to learn Python perfect.

Python Crawler crawls stock information

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Crawler crawls stock information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Crawler crawls stock information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support