Concept
Time series
The time series (or dynamic series) refers to the sequence of the values of the same statistic index according to the chronological order of their occurrence. The main purpose of time series analysis is to predict the future based on the historical data.
Time series Analysis
Time series analysis is based on the time series data observed by the system, and the theory and method of mathematical model are established by curve fitting and parameter estimation. Time series analysis is often used in the national macroeconomic control, market potential prediction, weather prediction, crop pest disaster prediction and other aspects. constituent elements
Elements: Long-term trends, seasonal changes, cyclical changes, irregular changes.
A general trend of change in the long-term trend (T), which is influenced by some fundamental factors over a long period of time.
Seasonal Variation (S) that occurs during the year as the seasons change.
Cyclic change (C) a regular change in the wave form that occurs over a period of several years
Irregular change (I) is a kind of irregular changes, including strict random changes and irregular abrupt effects of sudden changes in two types of models
Combinatorial model addition model of time series: Y=t+s+c+i (y,t the same gross index of measurement units) (S,c,i deviations from long-term trends or positive or negative) multiplication model: y=t S. C. I (commonly used models) (Y,t Unit of measure the same total amount of indicators) (S,c,i to the original number of indicators to increase or decrease the percentage) Arima Basic steps to obtain the observed system time series data, to the data drawing, observation is a stationary time series, for Non-stationary time series must first D-Order difference operation is converted into stationary time series, and a stationary time series has been obtained after the second step processing. To obtain the autocorrelation coefficient ACF and the partial autocorrelation coefficient pacf of the stationary time series, the best p and order Qare obtained by analyzing the autocorrelation graph and the partial autocorrelation graph. D, Q, p To get the Arima model. Then, the model is tested by the model. Python Arima Practice
Data: Airpassengers.csv
Base Library to use: Pandas,numpy,scipy,matplotlib,statsmodels.
Import pandas as PD
import numpy as NP
import Matplotlib.pylab as plt from
matplotlib.pylab import Rcparams
from statsmodels.tsa.stattools import Adfuller from
statsmodels.tsa.seasonal import seasonal_decompose
From Statsmodels.tsa.stattools import ACF, PACF from
statsmodels.tsa.arima_model import Arima
Read data:
# data=pd.read_csv ('/users/wangtuntun/desktop/airpassengers.csv ')
Dateparse = Lambda dates: Pd.datetime.strptime (dates, '%y-%m ')
# paese_dates The column in which the date is specified ; index_dates converts the string to date;d Ate_parser
data = pd.read_csv (' d:\\competition\\airpassengers.csv ', parse_dates=[' Month '), index_col= ' Month ', Date_parser =dateparse)
Analyze Data
def test_stationarity (timeseries): # decision UPS Statistics Rolmean = Pd.rolling_mean (TimeSeries, window=12) # move Average size data Rol_weighted_mean = Pd.ewma (TimeSeries, span=12) # Weighted moving average of size data ROLSTD = PD.ROLLING_STD (timeseries, window=1 2) # Deviation from the original value # draw the fluctuation statistics orig = Plt.plot (timeseries, color= ' Blue ', label= ' Original ') mean = Plt.plot (Rolmean
, color= ' Red ', label= ' rolling Mean ') Weighted_mean = Plt.plot (Rol_weighted_mean, color= ' green ', label= ' weighted Mean ') std = Plt.plot (rolstd, color= ' black ', label= ' rolling Std ') plt.legend (loc= ' best ') plt.title (' Rolling Mean &am P Standard deviation ') plt.show (block=false) # for DF Test print ' result of dickry-fuller test ' dftest = Adfuller (TimeSeries, autolag= ' AIC ') Dfoutput = PD. Series (Dftest[0:4], index=[' Test statistic ', ' p-value ', ' #Lags Used ', ' Number of observations Used ']) for key, value I n Dftest[4].items (): dfoutput[' Critical value (%s) '% key] = value Print Dfoutput TS = Data[' #Passengers '] plt.plot (TS) plt.show () test_stationarity (TS) plt.show ()
Long-term trends and cyclical changes can be seen.
# estimating
ts_log = np.log (ts)
# plt.plot (ts_log)
# plt.show () moving_avg
= Pd.rolling_mean (Ts_log,
# Plt.plot (moving_avg) #
Plt.plot (moving_avg,color= ' red ')
# plt.show ()
Ts_log_moving_avg_diff = Ts_log-moving_avg
# print Ts_log_moving_avg_diff.head (a)
Ts_log_moving_avg_diff.dropna (inplace=true)
test_stationarity (Ts_log_moving_avg_diff)
plt.show ()
difference D of time series
# differential differencing
Ts_log_diff = Ts_log.diff (1)
Ts_log_diff.dropna (inplace=true)
test_stationarity ( Ts_log_diff)
plt.show ()
The above figure shows that the first-order difference is roughly already cyclical and may draw a second-order difference comparison:
TS_LOG_DIFF1 = Ts_log.diff (1)
ts_log_diff2 = Ts_log.diff (2)
ts_log_diff1.plot ()
Ts_log_diff2.plot ()
Plt.show ()
Basically has not changed. So use first order difference. decomposition decomposing
# decomposition decomposing
decomposition = Seasonal_decompose (ts_log)
trend = decomposition.trend # trend
Seasonal = decomposition.seasonal # seasonal
residual = decomposition.resid # remaining
plt.subplot (411)
Plt.plot ( Ts_log,label= ' Original ')
plt.legend (loc= ' best ') plt.subplot (412) plt.plot (trend,label= ' trend '
)
plt.legend (loc= ' best ')
plt.subplot (413)
plt.plot (seasonal,label= ' seasonarity ')
plt.legend (loc= ' best ')
Plt.subplot (414)
Plt.plot (residual,label= ' residual ') plt.legend (
loc= ' best ')
plt.tight_layout ()
Plt.show ()
Forecast Determining Parameters
# determine parameter
LAG_ACF = ACF (Ts_log_diff, nlags=20)
LAG_PACF = PACF (Ts_log_diff, nlags=20, method= ' OLs ')
# Q's Get: For the first time, a curve in a ACF graph passes through a confidence interval. Here Q take 2
plt.subplot (121)
Plt.plot (LAG_ACF)
plt.axhline (y=0, linestyle= '--', color= ') Gray ')
plt.axhline (Y=-1.96/np.sqrt len (ts_log_diff), linestyle= '--', color= ' Gray ') # lowwer confidence interval
Plt.axhline (Y=1.96/np.sqrt len (Ts_log_diff)), linestyle= '--', color= ' Gray ') # upper confidence interval
plt.title (' autocorrelation Function ')
# P: PACF the curve for the first time through the confidence interval. Here P takes 2
plt.subplot (122)
Plt.plot (LAG_PACF)
plt.axhline (y=0, linestyle= '--', color= ' Gray ')
plt.axhline (Y=-1.96/np.sqrt (len (Ts_log_diff)), Linestyle= '--', color= ' Gray ')
plt.axhline (Y=1.96/np.sqrt (len (Ts_log_diff)), linestyle= '--', color= ' gray ')
plt.title (' Partial autocorrelation Function ')
plt.tight_layout ()
plt.show ()
AR Model
Model = ARIMA (Ts_log, order= (2, 1, 0))
Result_ar = Model.fit (disp=-1)
plt.plot (Ts_log_diff)
Plt.plot ( Result_ar.fittedvalues, color= ' red ')
plt.title (' AR model rss:%.4f '% sum (result_ar.fittedvalues-ts_log_diff) * * 2 )
Plt.show ()
MA Model
# MA Model
model = ARIMA (ts_log, order= (0, 1, 2))
Result_ma = Model.fit (disp=-1)
plt.plot (Ts_log_diff)
Plt.plot (result_ma.fittedvalues, color= ' red ')
plt.title (' MA model rss:%.4f '% sum (result_ma.fittedvalues-ts_ Log_diff) * * 2)
plt.show ()
ARIMA Model
# Arima combines the two effects better
model = Arima (Ts_log, order= (2, 1, 2))
Result_arima = Model.fit (disp=-1)
Plt.plot (Ts_log_diff)
plt.plot (result_arima.fittedvalues, color= ' red ')
plt.title (' ARIMA rss:%.4f '% sum ( Result_arima.fittedvalues-ts_log_diff) * * 2)
plt.show ()
Predictions_arima
Predictions_arima_diff = PD. Series (Result_arima.fittedvalues, copy=true)
# Print Predictions_arima_diff.head () #发现数据是没有第一行的, because there is a 1 delay
Predictions_arima_diff_cumsum = Predictions_arima_diff.cumsum ()
# print Predictions_arima_diff_cumsum.head ()
Predictions_arima_log = PD. Series (Ts_log.ix[0], index=ts_log.index)
Predictions_arima_log = Predictions_arima_log.add (predictions_ARIMA_ Diff_cumsum, fill_value=0)
# Print predictions_arima_log.head ()
Predictions_arima = Np.exp (predictions_ Arima_log)
plt.plot (TS)
plt.plot (Predictions_arima)
plt.title (' Predictions_arima RMSE:%.4f '% Np.sqrt (SUM (predictions_arima-ts) * * 2)/len (TS))
plt.show ()
Reference
Complete code: https://github.com/InsaneLife/MyPicture/blob/master/ARIMA_primer_test.py
Python Time series Analysis