Python Crawler crawls stock information

Source: Internet
Author: User

Python Crawler crawls stock information

The project requirement in our group is to grab stock information online and then make a visual interface. The first idea was to take advantage of Java crawling, but because Java code was a bit redundant, it decided to use Python. At the beginning of the project, there was a huge problem, because the team members have limited knowledge of Python, we decided to start by self-study and then write a simple crawler.

But there was a problem at the very beginning, and we initially assumed that we were grabbing stock information on East Net. After I climbed to the Web page information, I found that all the stock information is loaded in the page after loading, which makes the beginner Python I have a great deal of confusion, but the time is limited, I did not carry out systematic study. So, I can only find another way to seek help on the Internet. Finally, I found that I can crawl through the East net stock code information and then put the stock code in the Baidu stock market through the query, Baidu stock market through the stock information is loaded on the page directly loaded onto the page. Finding a way to solve the problem, I started to complete my code.

1 #Author:kevin Sun2 #Shanghai Stock Information3 4 ImportRequests5  fromBs4ImportBeautifulSoup6 ImportTraceback7 ImportRe8 Import Time9 Ten  One defGethtmltext (URL):#get the Web page source code you need A     Try: -User_agent ='mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36' -headers = {'user-agent': User_agent} ther = Requests.get (URL, headers=headers, timeout=30) - r.raise_for_status () -R.encoding =r.apparent_encoding -         returnR.text +     except: -         return "" +      A defGetFileName (): atDirName = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) -dirname+='SH' -     returndirname -      -      -  in defGetstocklist (LST, Stock_list_url):#get all the code for East net above Shanghai stock -HTML =Gethtmltext (Stock_list_url) toSoup = BeautifulSoup (HTML,'Html.parser') +A = Soup.find_all ('a')#use the Find_all method to traverse all ' a ' tags and extract the href part information from the ' A ' tab. -      forIinchA: the         Try: *href = i.attrs['href'] $Lst.append (Re.findall (R"sh6\d{5}", href) [0])#match the required information with a regular expression, "sh\d{6}"Panax Notoginseng             #print (LST) -         except: the             Continue +  A  the defGetstockinfo (LST, Stock_info_url, Fpath): +Ndate = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) -      forStockinchLST: $url = stock_info_url + stock +'. html' #Stitching URL $HTML =gethtmltext (URL) -         Try: -             ifHTML = ="": the                 Continue -Infodict = {}WuyiSoup = BeautifulSoup (HTML,'Html.parser') theStockinfo = Soup.find ('Div', attrs={'class':'stock-bets'}) -             ifStockinfo = = None:#The judgment is empty and returns Wu                 Continue -             #print (stockinfo) About             #name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0] $             #print (name) -             #infodict.update ({' Stock code ': stock}) -             #inp=name.text.split () [0]+ ":" -Keylist = Stockinfo.find_all ('DT') AValueList = Stockinfo.find_all ('DD') +inp=stock+","+ndate+"," the              forIinchRange (len (keylist)): -Key =Keylist[i].text $val =Valuelist[i].text theInfodict[key] =Val the             #print (INP) theinp+=infodict['Highest']+","+infodict['rate of change of hand']+","+infodict['Volume']+","+infodict['turnover']+"\ n" the             Print(INP) -With open (Fpath,'a', encoding='Utf-8') as F: in                  the                 #f.write (str (infodict) + ' \ n ') the f.write (INP) About         except: the Traceback.print_exc () the             Continue the  +  -  the defMain (): # The function above the main method callBayiStock_list_url ='http://quote.eastmoney.com/stocklist.html' theStock_info_url ='http://gupiao.baidu.com/stock/' theOutput_file ='./'+getfilename () +'. txt' -Slist = [] - getstocklist (slist, Stock_list_url) the Getstockinfo (Slist, Stock_info_url, output_file) the  the  theMain ()

The code used in Python is very convenient BeautifulSoup library (this library I do not explain more here, I refer to a Daniel's understanding, Link: http://cuiqingcai.com/1319.html), the crawler is using Google browser. In the code I also gave some comments, because it is a Python beginner, some places inevitably have errors, after the end of the project will continue to learn Python perfect.





























































































Python Crawler crawls stock information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.