Arima Model prediction of time series analysis-data mining

Source: Internet
Author: User
Tags diff

Reprinted from http://blog.sina.com.cn/s/blog_70f632090101bnd8.html#cmt_3111974

Today study Arima prediction time series.

The exponential smoothing method is very helpful for forecasting, and it has no requirement for the correlation between successive values in the time series. However, if you want to use the exponential smoothing method to calculate the prediction interval, then the predictive error must be irrelevant, and it must be a normal distribution that obeys the 0 mean value and the same variance. Even if the exponential smoothing method does not require a correlation between sequential values in time series, in some cases we can create better predictive models by considering the correlation between the data. The autoregressive Moving average model (ARIMA) contains a certain (explicit) statistical model for dealing with irregular parts of time series, and it also allows the irregular parts to be autocorrelation.

First, the difference of the data is determined first.

The ARIMA model is defined for a stationary time series. So if you start with a Non-stationary time series, you first need to do a time series difference until you get a stationary time series. If you have to do a D-order difference to a time series to get a stationary sequence, then you use the Arima (P,D,Q) model, where D is the order of difference.

We take the time series data from the diameter of the women's skirt edge every year as an example. From 1866 to 1911 was not stable on average. As time increases, the numbers change a lot.

> Skirts <-Scan ("Http://robjhyndman.com/tsdldata/roberts/skirts.dat", skip=5)

Read the items

> skirtsts<-ts (skirts,start = C (1866))

> plot.ts (skirtsts)


We can get the first-order difference of the time series (data stored in "skirtsts") by typing the following code, and draw a graph of the difference sequence:

> Skirtstsdiff<-diff (Skirtsts,differences=1)

> plot.ts (Skirtstsdiff)


It can be seen from the first order difference graph that the data is still not stable. We continue to be differential.

> Skirtstsdiff2<-diff (skirtsts,differences=2)

> plot.ts (SKIRTSTSDIFF2)

The time series after the two-second difference (above) does appear to be stationary on both the mean and the variance, and the level and variance of the time series is roughly unchanged over time. So it looks like we need to make two difference to the skirt diameter to get a stationary sequence.

The second step is to find the appropriate Arima model

If your time series is stationary, or if you convert the N-second difference into a stationary time series, the next step is to select the appropriate Arima model, which means you need to look for the appropriate p and Q values in Arima (P,D,Q). In order to get these, it is usually necessary to check the (autocorrelation) and partial correlation graphs of the stationary time series.

We use the "ACF ()" and "PACF" functions in R to separate (self) related and biased graphs. "ACF ()" and "PACF" set "Plot=false" to get the true values of autocorrelation and partial correlation.

> ACF (SKIRTSTSDIFF2,LAG.MAX=20)

> ACF (skirtstsdiff2,lag.max=20,plot=false)

Autocorrelations of series ' skirtstsdiff2 ', by lag


0 1 2 3 4 5 6 7 8 9 10

1.000-0.303 0.096 0.009 0.102-0.453 0.173-0.025-0.039 0.073-0.094

11 12 13 14 15 16 17 18 19 20

0.133-0.089-0.027-0.102 0.207-0.260 0.114 0.101 0.011-0.090

The autocorrelation graph shows that the 1-order autocorrelation value is basically not over the boundary value. Although the 5-order autocorrelation value is out of bounds, it is likely to be accidental, and the autocorrelation value does not exceed the significant bounds on any other, and we can expect that between 1 and 20 will occasionally exceed the 95% confidence boundary.

> pacf (skirtstsdiff2,lag.max=20)

> pacf (skirtstsdiff2,lag.max=20,plot=false)

Partial autocorrelations of series ' skirtstsdiff2 ' by lag


1 2 3 4 5 6 7 8 9 10 11

-0.303 0.005 0.043 0.128-0.439-0.110 0.073 0.028 0.128-0.355 0.095

12 13 14 15 16 17 18 19 20

0.052-0.094-0.103-0.034-0.021-0.002 0.074 0.020-0.034

The partial autocorrelation value chooses 5 order.

So our Armia model is Armia (1,2,5).

> Skirtsarima<-arima (Skirtsts,order=c (1,2,5))

> Skirtsarima

Sseries:skirtsts

ARIMA (1,2,5)


Coefficients:

AR1 ma1 ma2 ma3 ma4 MA5

-0.4345 0.2762 0.1033 0.1472 0.0267-0.8384

S.E. 0.1837 0.2171 0.2198 0.2716 0.1904 0.2888


Sigma^2 Estimated as 206.1:log likelihood=-183.8

aic=381.6 aicc=384.71 bic=394.09

Predicting the edge diameter of the skirt for the following 5 years

> Skirtsarimaforecast<-forecast. Arima (Skirtsarima,h=5,level=c (99.5))

> Skirtsarimaforecast

Point Forecast Lo 99.5 Hi 99.5

1912 548.5762 507.1167 590.0357

1913 545.1793 459.3292 631.0295

1914 540.9354 396.3768 685.4940

1915 531.8838 316.2785 747.4892

1916 529.1296 233.2625 824.9968

> Plot.forecast (skirtsarimaforecast$residuals) #谢谢 @ Recalling the water as smoke

The third step is to test

Under the exponential smoothing model, it is a good idea to observe whether the prediction error of ARIMA model is 0 and the normal distribution of variance is constant (obeys the normal distribution of 0 mean and variance), and to observe whether the continuous prediction error is (from) relevant.

> ACF (SKIRTSARIMAFORECAST$RESIDUALS,LAG.MAX=20)

> Box.test (skirtsarimaforecast$residuals, lag=20, type= "Ljung-box")


Box-ljung Test


Data:skirtsarimaforecast$residuals

x-squared = 8.5974, df = P-value = 0.9871

Since the correlation graph shows that the sample autocorrelation value does not exceed the significant (confidence) boundary in the Lag 1-20 order (L A G s 1-20), and the P value of the Ljung-box test is 0.99, we infer that there is no obvious evidence in the delay of 1-20-order (LAGS1-20) that the predictive error is non-zero. It's closed.

In order to investigate whether the predicted error is a normal distribution with zero mean and constant variance (obeys the normal distribution of 0 mean and variance), we can make the time curve of prediction error and histogram (with normal distribution curve):

> plot.ts (skirtsarimaforecast$residuals) > plotforecasterrors (skirtsarimaforecast$residuals)

The time curve in the prediction above shows that the variance is approximately constant (although the time sequence of the upper part) increases with time.

The column variance looks slightly higher. The histogram of time series shows that the prediction error is approximately normal distribution and the average value is close to 0 (the normal distribution of the 0 mean). Therefore, it is reasonable to consider the prediction error as a constant normal distribution with an average value of 0 variance (normal distribution of 0 mean and variance).

Since the sequential predictive error does not appear to be relevant and appears to be a normal distribution with a constant average of 0 variance (obeys the normal distribution of 0 mean and variance), ARIMA (1,2,5) seems to provide a very suitable model for the skirt diameter data. So far, the end of time series learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.