Time series correlation algorithm and analysis steps __ Time series

Source: Internet
Author: User
Tags truncated statsmodels
First of all, from the point of view of time can be a series of basically divided into 3 categories:

1. Pure random sequence (white noise sequence), this time can stop the analysis, because it is like predicting the next coin which side is as irregular as possible.

2. Stationary non-white noise sequences , whose mean and variance are constants, for such sequences, there are mature models to fit the future development of this sequence, such as Ar,ma,arma (Specific model algorithm and implementation in the back)

3. Non-stationary sequence , the general practice is to convert them into a stationary sequence, in accordance with the algorithm of the stationary sequence to fit. If the difference is smooth, the Arima model should be used for fitting.

Note: This model uses data for a restaurant within one months of sales data, including two features: Time and Sales Q1: The stability of the sequence of what to measure. Method 1 :

Based on the characteristics of the sequence diagram and autocorrelation graph, the subjective judgments are made, as shown in the following diagram:
Timeline diagram:

self-related diagram:

It can be seen from the above diagram that the absolute value of autocorrelation coefficient has long been maintained, so it can be judged that the time series has autocorrelation.

Stationary sequence autocorrelation graphs and partial autocorrelation graphs are either trailing or truncated.

The truncation is after a certain order, the coefficient is 0.
Trailing is a tendency to decay, but not all 0.

From the autocorrelation graph, there is a triangular symmetry form, there is no truncated or trailing, which belongs to the typical manifestation of monotone sequence, and the original data belongs to the non-stationary sequence.

Note:

If the autocorrelation is trailing and the partial correlation is truncated, the AR algorithm is used

If the autocorrelation is truncated and the trailing tail is offset, the MA algorithm is used

If autocorrelation and partial correlation are trailing, then using the ARMA algorithm, ARIMA is an extended version of the ARMA algorithm, similar in usage.

calculation method of correlation coefficient:

var represents variance
Method 2:

According to the unit root test

if the unit root exists, the sequence is a random nonstationary sequence Q2: How should the stationary sequence be analyzed?

at present, the most commonly used model of fitting stationary sequence is arma (autoregressive moving average) model, which is the autoregressive moving average model, and can be divided into AR model, MA model and ARMA model. 1. Autoregressive ar (P) model


The Autoregressive model describes the relationship between the current value and the historical value.
2. Moving Average MA (q) model


The moving average model describes the error accumulation of the autoregressive part.
3.ARMA (p,q) model

The ARMA (P,Q) model contains P-autoregressive and Q-moving averages, and the ARMA (P,Q) model can be expressed as:

When Q=0, it is the AR (p) model
When p =, it is the MA (q) model

General Analysis steps:
Q3: How to analyze the non-stationary sequence.

As can be seen from the model above, we need to convert it to a stationary sequence before we analyze it if it is a non-stationary sequence.

Generally we use ARIMA (autoregressive integrated Moving Average model) for analysis

In ARIMA (p,d,q) , AR is "autoregressive", p is the number of autoregressive items, MA is "sliding average", q is the number of sliding average, D is theDifference number (order) to make it a stationary sequence.

Although the word "difference" does not appear in the English name of Arima, it is a key step. Q4: Take a chestnut and look at it.

reading Data

#-*-coding:utf-8-*-
#arima时序模型

import pandas as PD

#参数初始化
discfile = '. /data/arima_data.xls '
forecastnum = 5

#读取数据, specifying a date as an indicator, pandas automatically recognizes the date column as datetime format
data = Pd.read_ Excel (Discfile, Index_col = U ' Date ')

self-correlation detection

#时序图
import Matplotlib.pyplot as Plt
plt.rcparams[' font.sans-serif '] = [' Simhei '] #用来正常显示中文标签
plt.rcparams[' axes.unicode_minus ' = False #用来正常显示负号
data.plot ()
plt.show ()

#自相关图
from Statsmodels.graphics.tsaplots Import PLOT_ACF
plot_acf (data). Show ()

#平稳性检测
from Statsmodels.tsa.stattools import Adfuller as ADF
print (U ' original sequence's ADF test result is: ', ADF (DATA[U ' sales '))
#返回值依次为adf, Pvalue, Usedlag, Nobs, critical values, Icbest, Regresults,

Self-correlation diagram

It can be seen that the absolute value of autocorrelation coefficient remains large for a long time, and there is autocorrelation in basic judgment.

The ADF detection result p value is significantly larger than 0.05 (p=0.9983), and the final judgment is non-stationary sequence.

continuous detection after first-order difference

#差分后的结果
d_data = Data.diff (). Dropna ()
d_data.columns = [u ' sales difference ']
d_data.plot () #时序图
plt.show
() PLOT_ACF (D_data). Show () #自相关图 from
statsmodels.graphics.tsaplots import plot_pacf
plot_pacf (D_data). Show () #偏自相关图
Print (the ADF test result for the U ' Difference sequence is: ', ADF (D_DATA[U ' sales Difference ')) #平稳性检测

#白噪声检验 from
statsmodels.stats.diagnostic Import Acorr_ljungbox
print (u ' difference sequence white noise test results are: ', Acorr_ljungbox (D_data, Lags=1)) #返回统计量和p值


The above figure is the difference of the sales results


Self-correlation diagram shows the properties of the 1-order truncated tail

Partial autocorrelation graph shows the properties of 1-order trailing tails

From the results of the ADF (p=0.0226) and autocorrelation graphs and the partial autocorrelation graphs, it can be seen that the first order difference sequence is a stationary non white noise sequence.

give the Arima Model a definite order
The d=1 of Arima model can be seen from the sequence of first order difference and the stable non white noise sequence.

Fixed Order Method:
1. Artificial judgment: Since the correlation diagram shows the truncated nature after the 1th-order, the partial autocorrelation graph shows the character of the trailing end from the 1th-order, so the MA (1) model is used for human judgment, that is, Arma (0,1,1)
2. Relative optimal model identification, when both p and Q are less than all the combination of the BIC information equal to 3, take the BIC information to achieve the minimum number of model orders.

#定阶
pmax = Int (len (d_data)/10) #一般阶数不超过length/10
qmax = Int (len (d_data)/10) #一般阶数不超过length/10
Bic_matrix = [] #bic矩阵 for
p in range (PMAX+1):
  tmp = [] for
  Q in Range (qmax+1):
    try: #存在部分报错, so skip the error with try.
      Tmp.append (ARIMA (data, (P,1,Q)). Fit (). Bic)
    except:
      tmp.append (None)
  bic_matrix.append (TMP)

Bic_matrix = PD. Dataframe (Bic_matrix) #从中可以找出最小值

p,q = Bic_matrix.stack (). Idxmin () #先用stack展平, and then use Idxmin to find the minimum position.
the smallest p and Q values of print (U ' bic) are:%
17

BIC Matrix
Take the number of model orders in which the BIC information is minimal.

Determine p=0,q=1

Fitting Model

Model = ARIMA (data, (P,1,Q)). Fit () #建立ARIMA (0, 1, 1)
Model.summary2 () #给出一份模型报告
Model.forecast (5) # As a 5-day forecast, returns the forecast result, standard error, confidence interval.

Finally get the prediction result of the model

data and complete code can be obtained by leaving a mailbox in the message Oh ~ Copyright NOTICE: Welcome reprint, Reprint please specify the source: Potatoes Potato potato: http://blog.csdn.net/qq_33414271 https:// blog.csdn.net/qq_33414271/article/details/79588126

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.