# A regression analysis of stocks and indices # 1.1 data Load Python libraries required for load analysis
import Statsmodels.api as smimport Statsmodels.formula.api as smfimport Statsmodels.graphics.api as smgimport patsy% Matplotlib inlineimport matplotlib.pyplot as Pltimport numpy as NP import pandas as PD from pandas import Series,dataframefrom scipy import statsimport seaborn as SNS
The period from January 1, 2015 to December 31, 2015 is determined.
import datetimestart = datetime.datetime(2015,1,1)end = datetime.datetime(2015,12,31)
To obtain the "Shanghai Composite" 2015 share price data, recorded as Datasz, the "robot" Company 2015 stock price data, recorded as DATAJQR.
fromimport DataReaderdatass = DataReader("000001.SS","yahoo",start,end)datajqr = DataReader("300024.SZ","yahoo",start,end)
D:\software\ new Folder (4) \lib\site-packages\pandas\io\data.py:33:futurewarning:the Pandas.io.data module is moved to a Separate package (Pandas-datareader) and'll is removed from pandas in a future version. After installing the Pandas-datareader package (Https://github.com/pydata/pandas-datareader), you can change the import " From Pandas.io import data, WB ' to ' from Pandas_datareader import data, WB. futurewarning)
datass.head()
|
Open |
| High
| Low
Close |
Volume |
ADJ Close |
Date |
|
|
|
|
|
|
2015-01-05 |
3350.52 |
3350.52 |
3350.52 |
3350.52 |
0 |
3350.52 |
2015-01-06 |
3351.45 |
3351.45 |
3351.45 |
3351.45 |
0 |
3351.45 |
2015-01-07 |
3373.95 |
3373.95 |
3373.95 |
3373.95 |
0 |
3373.95 |
2015-01-08 |
3293.46 |
3293.46 |
3293.46 |
3293.46 |
0 |
3293.46 |
2015-01-09 |
3285.41 |
3285.41 |
3285.41 |
3285.41 |
0 |
3285.41 |
datajqr.head()
|
Open |
high |
low |
Close |
Volume |
ADJ Close |
Date |
|
|
|
|
|
|
2015-01-01 |
39.39 |
39.39 |
39.39 |
39.39 |
0 |
39.37083 |
2015-01-02 |
39.39 |
39.39 |
39.39 |
39.39 |
0 |
39.37083 |
2015-01-05 |
38.83 |
39.33 |
37.30 |
39.01 |
20750100 |
38.99101 |
2015-01-06 |
38.76 |
41.29 |
38.50 |
41.22 |
24357600 |
41.19994 |
2015-01-07 |
41.21 |
41.60 |
40.05 |
40.18 |
16364700 |
40.16044 |
# # 1.2 Stock and Shanghai index data Exploratory analysis
close_ss = datass["Close"]close_jqr = datajqr["Close"]
Get a simple statistical result of the closing price of the Shanghai Composite 2015 trading day, as shown below. A total of 233 stock prices, the average index of 3739.79, the minimum value is 2927.29, the maximum value is 5166.35.
close_ss.describe()
Count 233.000000
Mean 3739.794893
STD 538.105387
Min 2927.290000
25% 3320.680000
50% 3617.060000
75% 4034.310000
Max 5166.350000
Name:close, Dtype:float64
Get a simple statistical result of the closing price of the robot company for each trading day in 2015, as shown below. A total of 261 stock price figures, the average price of 67.31, the minimum value is 39.01, the maximum value is 128.00.
close_jqr.describe()
Count 261.000000
Mean 67.317433
STD 20.643055
Min 39.010000
25% 51.800000
50% 68.500000
75% 82.550000
Max 128.000000
Name:close, Dtype:float64
Looking at the stock price fluctuations of the Shanghai Composite and the robotics companies, as shown below, there is a relatively consistent trend between the Shanghai Composite and the robotics company's share price volatility, which is more volatile than the Shanghai Composite.
fig,ax = plt.subplots(nrows=1,ncols=2,figsize=(15,6))close_ss.plot(ax=ax[0])ax[0].set_title("SZZZ")close_jqr.plot(ax=ax[1])ax[1].set_title("JQR")
<matplotlib.text.Text at 0x76712e47f0>
Based on the trading day of the data, the intersection of the Shanghai Composite Index and the robot company's 2015 share price is extracted, as shown below.
TrueTrue)stock = stock[["Close_x","Close_y"]]stock.columns = ["SZZZ","JQR"]stock.head()
|
szzz |
JQR |
Date |
|
|
2015-01-05 |
3350.52 |
39.01 |
2015-01-06 |
3351.45 |
41.22 |
2015-01-07 |
3373.95 |
40.18 |
2015-01-08 |
3293.46 |
40.15 |
2015-01-09 |
3285.41 |
39.36 |
The daily yield sequence of the Shanghai Composite and the robotics company is based on the share price, as shown below.
1)).dropna()daily_return.head()
|
szzz |
JQR |
Date |
|
|
2015-01-06 |
0.000278 |
0.056652 |
2015-01-07 |
0.006714 |
-0.025230 |
2015-01-08 |
-0.023856 |
-0.000747 |
2015-01-09 |
-0.002444 |
-0.019676 |
2015-01-12 |
-0.017072 |
0.004827 |
Observe the simple statistical value of the daily yield sequence as shown below. The average daily yield of the Shanghai Composite is 0.000556, the minimum value is-0.0849, and the maximum is 0.0769. The average value of a robot's stock is 0.003665, the minimum is 10.00, and the maximum data is an outlier.
daily_return.describe()
|
szzz |
JQR |
Count |
232.000000 |
232.000000 |
Mean |
0.000556 |
0.003665 |
Std |
0.025194 |
0.050061 |
Min |
-0.084907 |
-0.100017 |
25% |
-0.011398 |
-0.021297 |
50% |
0.002583 |
-0.000724 |
75% |
0.016720 |
0.026968 |
Max |
0.076940 |
0.209524 |
Observing outliers data
daily_return[daily_return["JQR"0.105]
|
szzz |
JQR |
Date |
|
|
2015-10-12 |
0.07694 |
0.209524 |
After analysis, the reason for the abnormal stock price data is the missing of the stock price data of two trading days on October 8 and October 9, which results in the calculation base of the daily yield is September 30, 2015.
Chart of the daily yield fluctuation of the Shanghai Composite and robotics companies
fig,ax = plt.subplots(nrows=1,ncols=2,figsize=(15,6))daily_return["SZZZ"].plot(ax=ax[0])ax[0].set_title("SZZZ")daily_return["JQR"].plot(ax=ax[1])ax[1].set_title("JQR")
<matplotlib.text.Text at 0x7671a40dd8>
The daily yield histogram and density map of the Shanghai Composite and robot companies are drawn, as shown below, and as a whole, the daily yield of the Shanghai Composite and the robotics company is normally distributed. Robotic companies, by contrast, have a lower daily yield than the Shanghai Composite.
fig,ax = plt.subplots(nrows=1,ncols=2,figsize=(15,6))sns.distplot(daily_return["SZZZ"],ax=ax[0])ax[0].set_title("SZZZ")sns.distplot(daily_return["JQR"],ax=ax[1])ax[1].set_title("JQR")
<matplotlib.text.Text at 0x76725906a0>
Draw a scatter chart of the daily yield of the Shanghai Composite and robotics companies, as shown below.
fig,ax = plt.subplots(nrows=1,ncols=1,figsize=(12,6))plt.scatter(daily_return["JQR"],daily_return["SZZZ"])plt.title("Scatter Plot of daily return between JQR and SZZZ")
<matplotlib.text.Text at 0x76726657b8>
The scatter plot shows that the stock price of the Shanghai Composite and the robotics company may have a linear positive correlation.
Regression analysis of 1.3 shares and Shanghai Composite Index
importas sm
Add intercept items.
daily_return["intercept"]=1.0
A stock is an independent variable, the Shanghai Composite Index is a dependent variable, and a regression analysis of the stock and the Shanghai Composite. Get the regression results as shown below.
model = sm.OLS(daily_return["JQR"],daily_return[["SZZZ","intercept"]])results = model.fit()results.summary()
OLS Regression Results
Dep. Variable: |
jqr |
r-squared: |
0.382 |
M Odel: |
OLS |
Adj. r-squared: |
0.379 |
Method: |
Least Squares |
f-statistic: |
142.0 |
Date: |
Fri, April |
Prob (f-statistic): |
8.29e-26 |
time: |
22:16:56 |
log-likelihood: |
421.79 | /tr>
No. Observations: |
232 |
AIC: |
-839.6 |
Df residuals: |
|
BIC: |
-832.7 |
Df Model: |
1 |
|
|
covariance Type: |
nonrobust |
|
|
|
Coef |
STD Err |
T |
p>|t| |
[95.0% Conf. Int.] |
Szzz |
1.2275 |
0.103 |
11.915 |
0.000 |
1.025 1.431 |
Intercept |
0.0030 |
0.003 |
1.151 |
0.251 |
-0.002 0.008 |
Omnibus: |
8.703 |
Durbin-watson: |
1.824 |
Prob (Omnibus): |
0.013 |
Jarque-bera (JB): |
9.653 |
Skew: |
0.350 |
Prob (JB): |
0.00801 |
Kurtosis: |
3.714 |
Cond. No. |
39.8 |
The regression results of single-element least squares show that there is a significant positive correlation between the daily return rate of stock and the daily yield of the Shanghai Composite. The omnibus coefficient is 0.382, which indicates that the daily yield variable of the Shanghai Composite has strong explanatory power to the robot's daily yield variable, the model fitting result is good, the F statistic and the P-value of the statistic are close to 0, the function of the independent variable is significant. The P-value of T-Statistic is close to 0, which indicates that the index is significant. The coefficient of self-variable is 1.2275, which indicates that the daily yield fluctuation of the robot company is larger than that of the Shanghai Composite, the risk of the stock is greater, and the potential gains and losses are greater. On average, the daily yield of the Shanghai Composite Index fluctuated 1%, and the stock daily yield fluctuated 1.2275%. The value of the Durbin-waston test is 1.824, indicating that there is no sequence correlation for the yield data. The P-value of Jarque-bera is close to 0, which indicates that the daily yield data obeys normal distribution.
A regression analysis of stocks and indices