Basic Environment for Python data analysis and visualization

Source: Internet
Author: User
Tags continuum analytics

First set up the basic environment, assuming there is already a Python operating environment. Then need to install some common basic library, such as NumPy, scipy for numerical calculation, pandas for data analysis, Matplotlib/bokeh/seaborn for data visualization. And then on demand to load the library of data acquisition, such as Tushare (http://pythonhosted.org/tushare/), Quandl (https://www.quandl.com/) and so on. There are also a number of free data sets (http://www.kdnuggets.com/datasets/index.html) available for analysis on the web. Also, it's best to install Ipython, which is much more powerful than the default Python shell.

More conveniently, you can use a Python distribution like Anaconda, which contains nearly 200 popular packages. Install the installation package from the Http://continuum.io/downloads Select the platform you are using. Also feel troublesome words with Python Quant Platform. Anaconda after the installation of Ipython should be able to see the relevant information.

[Email protected]:~/workspace$ ipython Python 2.7.9 | Anaconda 2.2.0 (64-bit) | (Default, APR 14 2015, 12:54:25) Type "Copyright", "credits" or "license" for more information. IPython 3.0.0--an enhanced Interactive Python.anaconda are brought to your by Continuum analytics.please check out:http:/ /continuum.io/thanks and https://binstar.org ...
The Anaconda band Conda is an open source Package Manager that can be used Conda info/list/search to view information and installed packages. To install/update/Remove packages you can use the Conda install/update/remove command. Such as:

$ Conda Install quandl$ Conda install bokeh$ conda Update pandas
If you need to install additional libraries, such as the Python library on GitHub, you can use Python's package installation, such as Pip install and Python setup.py--install. Note, however, that the Anaconda installation path is independent of the Python environment in the original system. So to install the package to anaconda that Python environment, you need to specify the following parameters, you can first look at the Python package path:

$ python-m Site--user-site
You then specify the path to the package when you install it, such as:
$ python setup.py Install--prefix=~/.local
If you want to avoid losing a piece of stuff before you start working, you can build Ipython profile and set it up. So each time Ipython starts the profile, the corresponding environment is set up. Create a profile named work:

$ Ipython Profile Create work
Then open the configuration file ~/.ipython/profile_work/ipython_config.py, modify it according to specific requirements, such as automatically loading some common packages.

C.interactiveshellapp.pylab = ' Auto ' ... c.terminalipythonapp.exec_lines = [     ' Import numpy as NP ',     ' import Pandas as PD '     ...]
If you want to work under the profile most of the time, you can add the following statement to the ~/.BASHRC:

Alias ipython= ' Ipython--profile=work '
So as long as the knock Ipython will be OK. To run a python script after entering the Ipython shell, simply perform the%run test.py.

Here are some examples of very trivial with some financial data:

1. Spy's EMA and candlestick chart
From __future__ import print_function, Divisionimport NumPy as Npimport pandas as Pdimport datetime as Dtimport pandas.io. Data as Webimport matplotlib.finance as Mpfimport matplotlib.dates as Mdatesimport Matplotlib.mlab as Mlabimport Matplotli B.pyplot as Pltimport matplotlib.font_manager as Font_managerstarttime = Dt.date (2015,1,1) Endtime = Dt.date.today () Ticker = ' SPY ' FH = Mpf.fetch_historical_yahoo (ticker, StartTime, endtime) R = Mlab.csv2rec (FH); Fh.close () r.sort () df = PD. Dataframe.from_records (r) quotes = MPF.QUOTES_HISTORICAL_YAHOO_OHLC (ticker, starttime, Endtime) FIG, (ax1, ax2) = Plt.subplots (2, sharex=true) TDF = Df.set_index (' date ') cdf = tdf[' Close ']cdf.plot (label = "Close Price", AX=AX1) Pd.rolling_mean (CDF, window=30, Min_periods=1). Plot (label = "30-day moving Averages", ax=ax1) Pd.rolling_mean (CDF, window=10, Min_periods=1). Plot (label = "10-day moving Averages", ax=ax1) Ax1.set_xlabel (R ' Date ') Ax1.set_ylabel ( R ' Price ') Ax1.grid (True) props = Font_manager. Fontproperties (size=10) leg =Ax1.legend (loc= ' lower right ', shadow=true, Fancybox=true, Prop=props) Leg.get_frame (). Set_alpha (0.5) ax1.set_title ('    %s Daily '% ticker, FONTSIZE=14) MPF.CANDLESTICK_OHLC (ax2, quotes, width=0.6) Ax2.set_ylabel (R ' Price ') to ax in Ax1, AX2: FMT = Mdates. Dateformatter ('%m/%d/%y ') ax.xaxis.set_major_formatter (FMT) Ax.grid (True) ax.xaxis_date () Ax.autoscale () fig.au Tofmt_xdate () fig.tight_layout () Plt.setp (PLT.GCA (). Get_xticklabels (), rotation=30) plt.show () fig.savefig (' SPY.png ')

2. The linear regression relationship between crude oil futures prices and gold prices in the New York Mercantile Exchange (NYMEX) in the last ten years

From __future__ import print_function, Divisionimport NumPy as Npimport pandas as Pdimport datetime as Dtimport quandlimpo RT Seaborn as Snssns.set (style= "Darkgrid") token = "???" # notice:you can get the token by signing-on Quandl (https://w ww.quandl.com/) StartTime = "2005-01-01" Endtime = "2015-01-01" interval = "monthly" Gold = Quandl.get ("bundesbank/bbk01_ WT5511 ", Authtoken=token, Trim_start=starttime, Trim_end=endtime, collapse=interval) nymex_oil_future = Quandl.get (" OFDP/FUTURE_CL1 ", Authtoken=token, Trim_start=starttime, Trim_end=endtime, collapse=interval) brent_oil_future = Quandl.get ("Chris/ice_b1", Authtoken=token, Trim_start=starttime, Trim_end=endtime, collapse=interval) #dat = nymex_ Oil_future.join (brent_oil_future, lsuffix= ' _a ', rsuffix= ' _b ', how= ' inner ') #g = Sns.jointplot ("Settle_a", "Settle_b", Data=dat, kind= "Reg") dat = Gold.join (nymex_oil_future, lsuffix= ' _a ', rsuffix= ' _b ', how= ' inner ') G = Sns.jointplot (" Value "," Settle ", Data=dat, kind=" Reg ")

3. The impact of China's three major industries on GDP

From __future__ import print_function, Divisionfrom collections import Ordereddictimport NumPy as Npimport pandas as Pdimp ORT datetime as Dtimport tushare as Tsfrom bokeh.charts import Bar, output_file, Showimport bokeh.plotting as Bpdf = ts.ge T_gdp_contrib () df = Df.drop ([' Industry ', ' Gdp_yoy '], axis=1) df = Df.set_index (' year ') df = Df.sort_index () years = Df.index.values.tolist () pri = df[' pi '].astype (float). valuessec = df[' si '].astype (float). Valuester = df[' Ti '].astype ( float). Valuescontrib = Ordereddict (Primary=pri, Secondary=sec, tertiary=ter) years = map (Unicode, Map (str, years)) Output_file ("stacked_bar.html") bar = bar (contrib, years, Stacked=true, title= "contribution rate for GDP",         xlabel= " Year ", ylabel=" contribution rate (%) ") Show (bar)

4. Domestic prev, deep reference, such as several major index distribution

#-*-Coding:utf-8-*-from __future__ import unicode_literalsfrom __future__ import print_function, Divisionfrom collecti ONS import Ordereddictimport Pandas as Pdimport Tushare as Tsfrom bokeh.charts import histogram, output_file, showsh = ts. Get_hist_data (' sh ') sz = ts.get_hist_data (' sz ') ZXB = Ts.get_hist_data (' zxb ') Cyb = Ts.get_hist_data (' cyb ') df = Pd.concat ([sh[' Close '], sz[' close '], zxb[' close '), cyb[' close '], Axis=1, keys=[' sh ', ' sz ', ' zxb ', ' Cyb ']) Fst_idx = -700dis Tributions = Ordereddict (sh=list (sh[' Close '][fst_idx:]), cyb=list (cyb[' Close '][fst_idx:]), sz=list (sz[' Close '][fst _IDX:]), zxb=list (zxb[' Close '][fst_idx:]) df = PD. DataFrame (distributions) col_mapping = {' sh ': U ' prev ', ' zxb ': U ' SME board ', ' cyb ': U ' entrepreneurial version ', ' sz ': U ' deep finger '}df.rena Me (columns=col_mapping, inplace=true) output_file ("histograms.html") hist = Histogram (DF, bins=50, Density=false, Legend= "Top_right") Show (hist)


5. Select the correlation of some key indicators (price-earnings ratio, city net rate, etc.) of listed companies in a three industry

#-*-Coding:utf-8-*-from __future__ import print_function, divisionfrom __future__ import Unicode_literalsfrom collecti ONS import ordereddictimport NumPy as Npimport pandas as Pdimport datetime as Dtimport Seaborn as Snsimport Tushare as TSF Rom bokeh.charts import Bar, output_file, showcls = ts.get_industry_classified () Stk = Ts.get_stock_basics () CLS = cls.set_ Index (' code ') tcls = cls[[' c_name ']]TSTK = stk[[' pe ', ' PB ', ' esp ', ' bvps ']]DF = Tcls.join (TSTK, how= ' inner ') clist = [df.ix[ i][' C_name ' for I in Xrange (3)]def Neq (A, B, eps=1e-6): Return abs (a) > EPSTDF = df.loc[df[' C_name '].isin (clist ) & Neq (df[' pe ', 0.0) & NEQ (df[' PB '), 0.0) & NEQ (df[' ESP '), 0.0) & Neq (df[' Bvps '], 0.0)]c ol_mapping = {' PE ': U ' p/e ', ' PB ': U ' p/bv ', ' ESP ': U ' EPS ', ' Bvps ': U ' Bvps '}tdf.rename (Columns=col_ mapping, Inplace=true) Sns.pairplot (TDF, hue= ' C_name ', size=2.5)





Basic Environment for Python data analysis and visualization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.