Python Crawler crawls stock information
The project requirement in our group is to grab stock information online and then make a visual interface. The first idea was to take advantage of Java crawling, but because Java code was a bit redundant, it decided to use Python. At the beginning of the project, there was a huge problem, because the team members have limited knowledge of Python, we decided to start by self-study and then write a simple crawler.
But there was a problem at the very beginning, and we initially assumed that we were grabbing stock information on East Net. After I climbed to the Web page information, I found that all the stock information is loaded in the page after loading, which makes the beginner Python I have a great deal of confusion, but the time is limited, I did not carry out systematic study. So, I can only find another way to seek help on the Internet. Finally, I found that I can crawl through the East net stock code information and then put the stock code in the Baidu stock market through the query, Baidu stock market through the stock information is loaded on the page directly loaded onto the page. Finding a way to solve the problem, I started to complete my code.
1 #Author:kevin Sun2 #Shanghai Stock Information3 4 ImportRequests5 fromBs4ImportBeautifulSoup6 ImportTraceback7 ImportRe8 Import Time9 Ten One defGethtmltext (URL):#get the Web page source code you need A Try: -User_agent ='mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36' -headers = {'user-agent': User_agent} ther = Requests.get (URL, headers=headers, timeout=30) - r.raise_for_status () -R.encoding =r.apparent_encoding - returnR.text + except: - return "" + A defGetFileName (): atDirName = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) -dirname+='SH' - returndirname - - - in defGetstocklist (LST, Stock_list_url):#get all the code for East net above Shanghai stock -HTML =Gethtmltext (Stock_list_url) toSoup = BeautifulSoup (HTML,'Html.parser') +A = Soup.find_all ('a')#use the Find_all method to traverse all ' a ' tags and extract the href part information from the ' A ' tab. - forIinchA: the Try: *href = i.attrs['href'] $Lst.append (Re.findall (R"sh6\d{5}", href) [0])#match the required information with a regular expression, "sh\d{6}"Panax Notoginseng #print (LST) - except: the Continue + A the defGetstockinfo (LST, Stock_info_url, Fpath): +Ndate = Time.strftime ('%y%m%d', Time.localtime (Time.time ())) - forStockinchLST: $url = stock_info_url + stock +'. html' #Stitching URL $HTML =gethtmltext (URL) - Try: - ifHTML = ="": the Continue -Infodict = {}WuyiSoup = BeautifulSoup (HTML,'Html.parser') theStockinfo = Soup.find ('Div', attrs={'class':'stock-bets'}) - ifStockinfo = = None:#The judgment is empty and returns Wu Continue - #print (stockinfo) About #name = Stockinfo.find_all (attrs={' class ': ' Bets-name '}) [0] $ #print (name) - #infodict.update ({' Stock code ': stock}) - #inp=name.text.split () [0]+ ":" -Keylist = Stockinfo.find_all ('DT') AValueList = Stockinfo.find_all ('DD') +inp=stock+","+ndate+"," the forIinchRange (len (keylist)): -Key =Keylist[i].text $val =Valuelist[i].text theInfodict[key] =Val the #print (INP) theinp+=infodict['Highest']+","+infodict['rate of change of hand']+","+infodict['Volume']+","+infodict['turnover']+"\ n" the Print(INP) -With open (Fpath,'a', encoding='Utf-8') as F: in the #f.write (str (infodict) + ' \ n ') the f.write (INP) About except: the Traceback.print_exc () the Continue the + - the defMain (): # The function above the main method callBayiStock_list_url ='http://quote.eastmoney.com/stocklist.html' theStock_info_url ='http://gupiao.baidu.com/stock/' theOutput_file ='./'+getfilename () +'. txt' -Slist = [] - getstocklist (slist, Stock_list_url) the Getstockinfo (Slist, Stock_info_url, output_file) the the theMain ()
The code used in Python is very convenient BeautifulSoup library (this library I do not explain more here, I refer to a Daniel's understanding, Link: http://cuiqingcai.com/1319.html), the crawler is using Google browser. In the code I also gave some comments, because it is a Python beginner, some places inevitably have errors, after the end of the project will continue to learn Python perfect.
Python Crawler crawls stock information