Time series analysis of the Arima hands-on-python__python

Source: Internet
Author: User
Tags diff statsmodels
Concept

Time series

The time series (or dynamic series) refers to the sequence of the values of the same statistic index according to the chronological order of their occurrence. The main purpose of time series analysis is to predict the future based on the historical data.

Time series Analysis

Time series analysis is based on the time series data observed by the system, and the theory and method of mathematical model are established by curve fitting and parameter estimation. Time series analysis is often used in the national macroeconomic control, market potential prediction, weather prediction, crop pest disaster prediction and other aspects. constituent elements

Elements: Long-term trends, seasonal changes, cyclical changes, irregular changes.

A general trend of change in the long-term trend (T), which is influenced by some fundamental factors over a long period of time.

Seasonal Variation (S) that occurs during the year as the seasons change.

Cyclic change (C) a regular change in the wave form that occurs over a period of several years

Irregular change (I) is a kind of irregular changes, including strict random changes and irregular abrupt effects of sudden changes in two types of models

Combinatorial model addition model of time series: Y=t+s+c+i (y,t the same gross index of measurement units) (S,c,i deviations from long-term trends or positive or negative) multiplication model: y=t S. C. I (commonly used models) (Y,t Unit of measure the same total amount of indicators) (S,c,i to the original number of indicators to increase or decrease the percentage) Arima Basic steps to obtain the observed system time series data, to the data drawing, observation is a stationary time series, for Non-stationary time series must first D-Order difference operation is converted into stationary time series, and a stationary time series has been obtained after the second step processing. To obtain the autocorrelation coefficient ACF and the partial autocorrelation coefficient pacf of the stationary time series, the best p and order Qare obtained by analyzing the autocorrelation graph and the partial autocorrelation graph. D, Q, p To get the Arima model. Then, the model is tested by the model. Python Arima Practice

Data: Airpassengers.csv

Base Library to use: Pandas,numpy,scipy,matplotlib,statsmodels.

Import pandas as PD
import numpy as NP
import Matplotlib.pylab as plt from
matplotlib.pylab import Rcparams
  from statsmodels.tsa.stattools import Adfuller from
statsmodels.tsa.seasonal import seasonal_decompose
From Statsmodels.tsa.stattools import ACF, PACF from
statsmodels.tsa.arima_model import Arima
Read data:
# data=pd.read_csv ('/users/wangtuntun/desktop/airpassengers.csv ')
Dateparse = Lambda dates: Pd.datetime.strptime (dates, '%y-%m ')
# paese_dates The column in which the date is specified  ; index_dates converts the string to date;d Ate_parser
data = pd.read_csv (' d:\\competition\\airpassengers.csv ', parse_dates=[' Month '), index_col= ' Month ', Date_parser =dateparse)
Analyze Data
def test_stationarity (timeseries): # decision UPS Statistics Rolmean = Pd.rolling_mean (TimeSeries, window=12) # move Average size data Rol_weighted_mean = Pd.ewma (TimeSeries, span=12) # Weighted moving average of size data ROLSTD = PD.ROLLING_STD (timeseries, window=1 2) # Deviation from the original value # draw the fluctuation statistics orig = Plt.plot (timeseries, color= ' Blue ', label= ' Original ') mean = Plt.plot (Rolmean 
    , color= ' Red ', label= ' rolling Mean ') Weighted_mean = Plt.plot (Rol_weighted_mean, color= ' green ', label= ' weighted Mean ') std = Plt.plot (rolstd, color= ' black ', label= ' rolling Std ') plt.legend (loc= ' best ') plt.title (' Rolling Mean &am P Standard deviation ') plt.show (block=false) # for DF Test print ' result of dickry-fuller test ' dftest = Adfuller (TimeSeries, autolag= ' AIC ') Dfoutput = PD. Series (Dftest[0:4], index=[' Test statistic ', ' p-value ', ' #Lags Used ', ' Number of observations Used ']) for key, value I n Dftest[4].items (): dfoutput[' Critical value (%s) '% key] = value Print Dfoutput TS = Data[' #Passengers '] plt.plot (TS) plt.show () test_stationarity (TS) plt.show ()
 

Long-term trends and cyclical changes can be seen.

# estimating
ts_log = np.log (ts)
# plt.plot (ts_log)
# plt.show () moving_avg
= Pd.rolling_mean (Ts_log,
# Plt.plot (moving_avg) #
Plt.plot (moving_avg,color= ' red ')
# plt.show ()
Ts_log_moving_avg_diff = Ts_log-moving_avg
# print Ts_log_moving_avg_diff.head (a)
Ts_log_moving_avg_diff.dropna (inplace=true)
test_stationarity (Ts_log_moving_avg_diff)
plt.show ()

difference D of time series

# differential differencing
Ts_log_diff = Ts_log.diff (1)
Ts_log_diff.dropna (inplace=true)
test_stationarity ( Ts_log_diff)
plt.show ()

The above figure shows that the first-order difference is roughly already cyclical and may draw a second-order difference comparison:

TS_LOG_DIFF1 = Ts_log.diff (1)
ts_log_diff2 = Ts_log.diff (2)
ts_log_diff1.plot ()
Ts_log_diff2.plot ()
Plt.show ()

Basically has not changed. So use first order difference. decomposition decomposing

# decomposition decomposing
decomposition = Seasonal_decompose (ts_log)

trend = decomposition.trend  # trend
Seasonal = decomposition.seasonal  # seasonal
residual = decomposition.resid  # remaining

plt.subplot (411)
Plt.plot ( Ts_log,label= ' Original ')
plt.legend (loc= ' best ') plt.subplot (412) plt.plot (trend,label= ' trend '
)
plt.legend (loc= ' best ')
plt.subplot (413)
plt.plot (seasonal,label= ' seasonarity ')
plt.legend (loc= ' best ')
Plt.subplot (414)
Plt.plot (residual,label= ' residual ') plt.legend (
loc= ' best ')
plt.tight_layout ()
Plt.show ()

Forecast Determining Parameters

# determine parameter
LAG_ACF = ACF (Ts_log_diff, nlags=20)
LAG_PACF = PACF (Ts_log_diff, nlags=20, method= ' OLs ')
# Q's Get: For the first time, a curve in a ACF graph passes through a confidence interval. Here Q take 2
plt.subplot (121)
Plt.plot (LAG_ACF)
plt.axhline (y=0, linestyle= '--', color= ') Gray ')
plt.axhline (Y=-1.96/np.sqrt len (ts_log_diff), linestyle= '--', color= ' Gray ')  # lowwer confidence interval
Plt.axhline (Y=1.96/np.sqrt len (Ts_log_diff)), linestyle= '--', color= ' Gray ')  # upper confidence interval
plt.title (' autocorrelation Function ')
# P: PACF the curve for the first time through the confidence interval. Here P takes 2
plt.subplot (122)
Plt.plot (LAG_PACF)
plt.axhline (y=0, linestyle= '--', color= ' Gray ')
plt.axhline (Y=-1.96/np.sqrt (len (Ts_log_diff)), Linestyle= '--', color= ' Gray ')
plt.axhline (Y=1.96/np.sqrt (len (Ts_log_diff)), linestyle= '--', color= ' gray ')
plt.title (' Partial autocorrelation Function ')
plt.tight_layout ()
plt.show ()

AR Model

Model = ARIMA (Ts_log, order= (2, 1, 0))
Result_ar = Model.fit (disp=-1)
plt.plot (Ts_log_diff)
Plt.plot ( Result_ar.fittedvalues, color= ' red ')
plt.title (' AR model rss:%.4f '% sum (result_ar.fittedvalues-ts_log_diff) * * 2 )
Plt.show ()

MA Model

# MA Model
model = ARIMA (ts_log, order= (0, 1, 2))
Result_ma = Model.fit (disp=-1)
plt.plot (Ts_log_diff)
Plt.plot (result_ma.fittedvalues, color= ' red ')
plt.title (' MA model rss:%.4f '% sum (result_ma.fittedvalues-ts_ Log_diff) * * 2)
plt.show ()

ARIMA Model

# Arima combines the two  effects better
model = Arima (Ts_log, order= (2, 1, 2))
Result_arima = Model.fit (disp=-1)
Plt.plot (Ts_log_diff)
plt.plot (result_arima.fittedvalues, color= ' red ')
plt.title (' ARIMA rss:%.4f '% sum ( Result_arima.fittedvalues-ts_log_diff) * * 2)
plt.show ()

Predictions_arima

Predictions_arima_diff = PD. Series (Result_arima.fittedvalues, copy=true)
# Print Predictions_arima_diff.head () #发现数据是没有第一行的, because there is a 1 delay

Predictions_arima_diff_cumsum = Predictions_arima_diff.cumsum ()
# print Predictions_arima_diff_cumsum.head ()

Predictions_arima_log = PD. Series (Ts_log.ix[0], index=ts_log.index)
Predictions_arima_log = Predictions_arima_log.add (predictions_ARIMA_ Diff_cumsum, fill_value=0)
# Print predictions_arima_log.head ()

Predictions_arima = Np.exp (predictions_ Arima_log)
plt.plot (TS)
plt.plot (Predictions_arima)
plt.title (' Predictions_arima RMSE:%.4f '% Np.sqrt (SUM (predictions_arima-ts) * * 2)/len (TS))
plt.show ()

Reference

Complete code: https://github.com/InsaneLife/MyPicture/blob/master/ARIMA_primer_test.py

Python Time series Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.