This program uses Python 2.7.6 to write, expand the python comes with the htmlparser, self-actively according to the preset stock code list, from Yahoo Finance crawl list of data date, stock name, real-time quote, change rate of the day, the lowest price of the day, the highest price of the day.
Because the values in the Yahoo Finance stock page have a corresponding ID.
Like the Nasdaq 100 ETF (QQQ) HTTP://FINANCE.YAHOO.COM/Q?S=QQQ
The HTML markup for real-time quotes is
<span id= "YFS_L84_QQQ" >87.49</span>
and the S & P 500 index ETF (SPY) Http://finance.yahoo.com/q?s=spy
The HTML markup for real-time quotes is
<span id= "Yfs_l84_spy" >187.25</span>
So this data crawler looks for data based on the corresponding ID string. In detail, first inherit Htmlparser, and then overload the Handle_data (self, data) method in the subclass of your definition to find the HTML tag that includes the corresponding ID string (such as the ID string for the real-time quote "yfs_l84_" + Stock code). and output the data in this HTML tag (such as QQQ's <span id= "YFS_L84_QQQ" >87.49</SPAN> the data 87.49 is the real-time quote. )
Sample output:
The data is sequentially
Data Date stock ticker stock name Real time quote daily Change rate daily lowest price daily high
05/05/2014ibbishares Nasdaq Biotechnology (IBB) 233.281.85%225.34233.2805/05/2014soclglobal X Social Media Index ETF ( SOCL) 17.480.17%17.1217.5305/05/2014pnqipowershares NASDAQ Internet (pnqi) 62.610.35%61.4662.7405/05/2014xsdspdr S &p Semiconductor ETF (XSD) 67.150.12%66.2067.4105/05/2014itaishares US Aerospace & Defense (ITA) 110.341.15% 108.62110.5605/05/2014iaiishares US broker-dealers (IAI) 37.42-0.21%36.8637.4205/05/2014vbkvanguard Small Cap Growth ETF (VBK) 119.97-0.03%118.37120.0905/05/2014qqqpowershares QQQ (QQQ) 87.950.53%86.7687.9705/05/2014ewiishares MSCI Italy Capped (EWI) 17.86-0.56%17.6517.8905/05/2014dfewisdomtree Europe SmallCap Dividend (DFE) 62.33-0.11%61.9462.3905 /05/2014pbdpowershares Global Clean Energy (PBD) 13.030.00%12.9713.0505/05/2014eirlishares MSCI Ireland Capped (eirl) 38.52-0.16%38.3938.60
This procedure source code:
Https://bitbucket.org/lsz/html-parser
Official documentation for Htmlparser:
Https://docs.python.org/2/library/htmlparser.html
Htmlparser (parsing HTML document elements)
http://blog.csdn.net/hxsstar/article/details/17241709
A very concise Python web crawler, its own initiative from the Yahoo Wealth by crawling stock data