Python資料分析及可視化的基本環境

來源:互聯網
上載者:User

標籤:

首先搭建基本環境,假設已經有Python運行環境。然後需要裝上一些通用的基本庫,如numpy, scipy用以數值計算,pandas用以資料分析,matplotlib/Bokeh/Seaborn用來資料視覺效果。再按需裝上資料擷取的庫,如Tushare(http://pythonhosted.org/tushare/),Quandl(https://www.quandl.com/)等。網上還有很多可供分析的免費資料集(http://www.kdnuggets.com/datasets/index.html)。另外,最好裝上IPython,比預設Python shell強大許多。

更方便地,可以用Anaconda這樣的Python發行版,它裡麵包含了近200個流行的包。從http://continuum.io/downloads選擇所用平台的安裝包安裝。還覺得麻煩的話用Python Quant Platform。Anaconda裝好後進入IPython應該就可以看到相關資訊了。

[email protected]:~/workspace$ ipython Python 2.7.9 |Anaconda 2.2.0 (64-bit)| (default, Apr 14 2015, 12:54:25) Type "copyright", "credits" or "license" for more information.IPython 3.0.0 -- An enhanced Interactive Python.Anaconda is brought to you by Continuum Analytics.Please check out: http://continuum.io/thanks and https://binstar.org...
Anaconda帶的conda是一個開源包管理器,可以用conda info/list/search查看資訊和已安裝包。要安裝/更新/刪除包可以用conda install/update/remove命令。如:

$ conda install Quandl$ conda install bokeh$ conda update pandas
如果還需要裝些其它庫,比如github上的Python庫,可以用Python的包安裝方式,如pip install和python setup.py --install。不過要注意的是Anaconda安裝路徑是獨立於原系統中的Python環境的。所以要把包安裝到Anaconda那個Python環境的話需要指定下參數,可以先看下Python的包路徑:

$ python -m site --user-site
然後安裝包時指定到該路徑,如:
$ python setup.py install --prefix=~/.local
如果想避免每次開始工作前都輸一坨東西,可以建ipython的profile,在其中進行設定。這樣每次ipython啟動該profile時,相應的環境都自己設定好了。建立名為work的profile:

$ ipython profile create work
然後開啟設定檔~/.ipython/profile_work/ipython_config.py,按具體的需求進行修改,比如自動載入一些常用的包。

c.InteractiveShellApp.pylab = 'auto'...c.TerminalIPythonApp.exec_lines = [     'import numpy as np',     'import pandas as pd'     ...]
如果大多數時候都要到該profile下工作的話可以在~/.bashrc裡加上下面語句:

alias ipython='ipython --profile=work'
這樣以後只要敲ipython就OK了。進入ipython shell後要運行python指令碼只需執行%run test.py。

下面以一些財經資料為例舉一些非常trivial的例子:

1. SPY的均線和candlestick圖
from __future__ import print_function, divisionimport numpy as npimport pandas as pdimport datetime as dtimport pandas.io.data as webimport matplotlib.finance as mpfimport matplotlib.dates as mdatesimport matplotlib.mlab as mlabimport matplotlib.pyplot as pltimport matplotlib.font_manager as font_managerstarttime = dt.date(2015,1,1)endtime = dt.date.today()ticker = 'SPY'fh = mpf.fetch_historical_yahoo(ticker, starttime, endtime)r = mlab.csv2rec(fh); fh.close()r.sort()df = pd.DataFrame.from_records(r)quotes = mpf.quotes_historical_yahoo_ohlc(ticker, starttime, endtime)fig, (ax1, ax2) = plt.subplots(2, sharex=True)tdf = df.set_index('date')cdf = tdf['close']cdf.plot(label = "close price", ax=ax1)pd.rolling_mean(cdf, window=30, min_periods=1).plot(label = "30-day moving averages", ax=ax1)pd.rolling_mean(cdf, window=10, min_periods=1).plot(label = "10-day moving averages", ax=ax1)ax1.set_xlabel(r'Date')ax1.set_ylabel(r'Price')ax1.grid(True)props = font_manager.FontProperties(size=10)leg = ax1.legend(loc='lower right', shadow=True, fancybox=True, prop=props)leg.get_frame().set_alpha(0.5)ax1.set_title('%s Daily' % ticker, fontsize=14)mpf.candlestick_ohlc(ax2, quotes, width=0.6)ax2.set_ylabel(r'Price')for ax in ax1, ax2:    fmt = mdates.DateFormatter('%m/%d/%Y')    ax.xaxis.set_major_formatter(fmt)    ax.grid(True)    ax.xaxis_date()    ax.autoscale()fig.autofmt_xdate()fig.tight_layout()plt.setp(plt.gca().get_xticklabels(), rotation=30)plt.show()fig.savefig('SPY.png')

2. 近十年中紐約商業證券交易所(NYMEX)原油期貨價格和黃金價格的線性迴歸關係

from __future__ import print_function, divisionimport numpy as npimport pandas as pdimport datetime as dtimport Quandlimport seaborn as snssns.set(style="darkgrid")token = "???" # Notice: You can get the token by signing up on Quandl (https://www.quandl.com/)starttime = "2005-01-01"endtime = "2015-01-01"interval = "monthly"gold = Quandl.get("BUNDESBANK/BBK01_WT5511", authtoken=token, trim_start=starttime, trim_end=endtime, collapse=interval)nymex_oil_future = Quandl.get("OFDP/FUTURE_CL1", authtoken=token, trim_start=starttime, trim_end=endtime, collapse=interval)brent_oil_future = Quandl.get("CHRIS/ICE_B1", authtoken=token, trim_start=starttime, trim_end=endtime, collapse=interval)#dat = nymex_oil_future.join(brent_oil_future, lsuffix='_a', rsuffix='_b', how='inner')#g = sns.jointplot("Settle_a", "Settle_b", data=dat, kind="reg")dat = gold.join(nymex_oil_future, lsuffix='_a', rsuffix='_b', how='inner')g = sns.jointplot("Value", "Settle", data=dat, kind="reg")

3. 我國三大產業對於GDP的影響

from __future__ import print_function, divisionfrom collections import OrderedDictimport numpy as npimport pandas as pdimport datetime as dtimport tushare as tsfrom bokeh.charts import Bar, output_file, showimport bokeh.plotting as bpdf = ts.get_gdp_contrib()df = df.drop(['industry', 'gdp_yoy'], axis=1)df = df.set_index('year')df = df.sort_index()years = df.index.values.tolist()pri = df['pi'].astype(float).valuessec = df['si'].astype(float).valuester = df['ti'].astype(float).valuescontrib = OrderedDict(Primary=pri, Secondary=sec, Tertiary=ter)years = map(unicode, map(str, years))output_file("stacked_bar.html")bar = Bar(contrib, years, stacked=True, title="Contribution Rate for GDP",         xlabel="Year", ylabel="Contribution Rate(%)")show(bar)

4. 國內滬指,深指等幾大指數分布

# -*- coding: utf-8 -*-from __future__ import unicode_literalsfrom __future__ import print_function, divisionfrom collections import OrderedDictimport pandas as pdimport tushare as tsfrom bokeh.charts import Histogram, output_file, showsh = ts.get_hist_data('sh')sz = ts.get_hist_data('sz')zxb = ts.get_hist_data('zxb')cyb = ts.get_hist_data('cyb')df = pd.concat([sh['close'], sz['close'], zxb['close'], cyb['close']],         axis=1, keys=['sh', 'sz', 'zxb', 'cyb'])fst_idx = -700distributions = OrderedDict(sh=list(sh['close'][fst_idx:]), cyb=list(cyb['close'][fst_idx:]), sz=list(sz['close'][fst_idx:]), zxb=list(zxb['close'][fst_idx:]))df = pd.DataFrame(distributions)col_mapping = {'sh': u'滬指',        'zxb': u'中小板',        'cyb': u'創業版',        'sz': u'深指'}df.rename(columns=col_mapping, inplace=True)output_file("histograms.html")hist = Histogram(df, bins=50, density=False, legend="top_right")show(hist)


5. 選取某三個行業中上市公司若干關鍵計量(市盈率,市淨率等)的相關性

# -*- coding: utf-8 -*-from __future__ import print_function, divisionfrom __future__ import unicode_literalsfrom collections import OrderedDictimport numpy as npimport pandas as pdimport datetime as dtimport seaborn as snsimport tushare as tsfrom bokeh.charts import Bar, output_file, showcls = ts.get_industry_classified()stk = ts.get_stock_basics()cls = cls.set_index('code')tcls = cls[['c_name']]tstk = stk[['pe', 'pb', 'esp', 'bvps']]df = tcls.join(tstk, how='inner')clist = [df.ix[i]['c_name'] for i in xrange(3)]def neq(a, b, eps=1e-6):    return abs(a - b) > epstdf = df.loc[df['c_name'].isin(clist) & neq(df['pe'], 0.0) &         neq(df['pb'], 0.0) & neq(df['esp'], 0.0) &         neq(df['bvps'], 0.0)]col_mapping = {'pe' : u'P/E',        'pb' : u'P/BV',        'esp' : u'EPS',        'bvps' : u'BVPS'}tdf.rename(columns=col_mapping, inplace=True)sns.pairplot(tdf, hue='c_name', size=2.5)





Python資料分析及可視化的基本環境

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.