Time series mode (ARIMA)---python implementation

Source: Internet
Author: User
Tags wrapper statsmodels

The main purpose of time series analysis is to predict the future based on the historical data. such as food and beverage sales forecasts can be seen as a time series based on short-term data projections, the predicted object when the sales of specific dishes.

1. Time Series algorithm:

A common time series model;

?

2. Preprocessing of time series models

1. For a pure random sequence, also known as the white noise sequence, there is no relationship between the sequence of the items, the sequence in a completely disorderly random fluctuations, can terminate the analysis of the sequence.

2. For a stationary non-white noise sequence, its mean and variance are constants. ARMA model is the most commonly used stationary sequence fitting model.

3. For non-stationary sequences, because of its variance and mean value instability, the processing method is generally converted to a stationary sequence. You can use the Arima model for analysis.

Test for smoothness:

1. Timing diagram Test: According to the average and variance of the stationary time series are constant characteristics, the sequence diagram of the stationary sequence shows that the series value clock fluctuates randomly around a parameter, and the range of fluctuations is bounded. If there is a clear trend or periodicity, it is usually not a stationary sequence.

2. Self-correlation diagram test: stationary sequence has a short-term correlation, this property shows that for stationary sequences, usually only the recent sequence is more obvious, the distance between the past value of the current is worth less. the self-correlation coefficients of non-stationary sequences decay more slowly.

3. Unit root test: unit root test refers to the existence of unit root in the test sequence, if there is a unit root, that is the non-stationary time series. At present, the most common method is unit root test.

The original hypothesis is a non-stationary sequence, the alternative hypothesis is a stationary sequence, and the trend stationary process

?

?

?

?

?

Above reference: Baidu Library

3. Time Series Analysis:

? Smoothness:

The smoothness requires that the fitted curves obtained through the sample time series continue to continue along the existing form ' inertia ' over a period of time.

Smoothness requires that the mean and variance of the sequence do not change significantly.

Weak stationary: the expectation and correlation coefficients (dependencies) are constant, and the value of T at some point in the future, the XT relies on its past information.

Difference Method: The difference between the time series at T and T-1 time (using the difference to satisfy the smoothness), the general difference of the first order.

? AR (autoregressive model):

? Describes the relationship between the current value and the historical value, and predicts itself with the historical time data of the variable itself. The autoregressive model must meet the requirements of smoothness.

Formula definition:?

?

?

Limitations of the Autoregressive model:

1. Autoregressive models are predictions using their own data

2. Must have smoothness

3. Must have relevance, if the correlation is less than 0.5, it is not appropriate to use

4. Autoregressive models are only suitable for predicting predictions related to their own prophase.

? MA (moving average model):

The moving average model is concerned with the accumulation of error items in the Autoregressive model.

The moving average method can effectively eliminate the stochastic fluctuations in the prediction.

?

? ARMA (autoregressive average model):

? The combination of autoregressive and moving averages.

?

? Arima (p,d,q) differential autoregressive moving average model (autoregressive Integrated moving Average model, referred to as Arima)

AR is Autoregressive, p is the autoregressive term, MA is the moving average, Q is the moving average, and D is the number of times the time series is called stationary.

? Principle: The non-stationary time series is converted into a stationary time series, and then the dependent variable is reviewed only by its hysteresis value (P-order) and the present value and hysteresis value of the random error term.

? ARIMA Modeling Process:

? 1. Smoothing The sequence (difference method determines D)

2.P and Q Order Determination (ACF and PACF)

? 3. Building model ARIMA (p, D, Q)

?

Use ARIMA models to forecast sales data for a restaurant

Modeling non-stationary time series by #使用ARIMA model
#差分运算具有强大的确定性的信息提取能力, many nonstationary sequence differences show the properties of a stationary sequence, which is called the nonstationary sequence as a differential stationary sequence.
#对差分平稳序列可以还是要ARMA model is fitted, the essence of ARIMA model is the combination of differential budgeting and ARMA model.

#coding =GBK#使用ARIMA model for non-stationary time series Memory modeling operation#差分运算具有强大的确定性的信息提取能力, many nonstationary sequence differences show the properties of a stationary sequence, which is called the nonstationary sequence as a differential stationary sequence.#对差分平稳序列可以还是要ARMA model is fitted, the essence of ARIMA model is the combination of differential budgeting and ARMA model. #导入数据Import Pandas as Pdfilename = R ' D:\datasets\arima_data.xls ' data = pd.read_excel (filename, index_ col = u ' date ')#画出时序图import matplotlib.pyplot as plt plt.rcparams[' font.sans-serif '] = [' Simhei ' ] #定义使其正常显示中文字体黑体plt. rcparams[' axes.unicode_minus ' = False #用来正常显示表示负号# data.plot ()# Plt.show ()                

?

 #画出自相关性图 from statsmodels.graphics.tsaplots import plot_acf, Plot_pacf# plot_acf (data) # Plt.show ()  #平稳性检测 from statsmodels.tsa.stattools import adfullerprint (u ' Sales '))  #原始序列的检验结果为: (1.8137710150945268, 0.9983759421514264, 10, 26, {' 1% ':-3.7112123008648155, # ' 10% ': -2.6300945562130176, ' 5% ': -2.981246804733728}, 299.46989866024177)  #返回值依次为: ADF, Pvalue P-value, Usedlag, Nobs, critical values threshold, Icbest, Regresults, Resstore  #adf is greater than 3 in different test levels of 3 critical values, unit detection statistics corresponding p value is significantly greater than 0.05, indicating that the sequence can be determined as a non-stationary sequence       

?

 #对数据进行差分后得到 autocorrelation graph and Partial correlation graph D_data = Data.diff (). Dropna () D_data.columns = [u ' Sales differential ']d_data.plot ()  #画出差分后的时序图 # plt.show () PLOT_ACF (d_data)  #画出自相关图 # plt.show () PLOT_PACF (D_data)  #画出偏相关图 # plt.show () print (u ' differential sequence ADF Test result: ', Adfuller (D_data[u ' sales differential '))  #平稳性检验  #差分序列的ADF test results are: ( -3.1560562366723537, 0.022673435440048798, 0, +, {' 1% '): -3.6327426647230316, # ' 10% ': -2.6130173469387756, ' 5% ': -2.9485102040816327}, 287.5909090780334)  #一阶差分后的序列的时序图在均值附近比较平稳的波动, Autocorrelation has a strong short-term correlation, the unit root test P-value is less than 0.05, so the first-order differential sequence is a stationary sequence    

?

?

?

#对一阶差分后的序列做白噪声检验From Statsmodels.stats.diagnosticImport Acorr_ljungboxprint (White noise test results for u ' differential sequence: ', Acorr_ljungbox (D_data, lags=1))#返回统计量和 P-Value# White Noise test result for differential sequence: (Array ([11.30402222]), array ([0.00077339]) P-value is the second term, much less than 0.05#对模型进行定阶From Statsmodels.tsa.arima_modelImport ARIMA pmax = Int (len (d_data)/10)#一般阶数不超过 length/10qmax = Int (len (d_data)/Ten) Bic_matrix = []For PIn range (Pmax +1): temp= []for q in range (Qmax+1): try:temp.append (ARIMA (data, (p, 1, Q)). Fit (). Bic) except:temp.append (none) bic_matrix.append (temp) bic_ Matrix = PD. DataFrame (Bic_matrix)  #将其转换成Dataframe data structure p,q = Bic_matrix.stack (). Idxmin ()  #先使用stack flatten, then use Idxmin to find the minimum position of print (u ' BIC minimum P-value and Q-value:%s,%s '% (p,q)) # BIC minimum P-value and Q-value: 0,1 #所以可以建立ARIMA model, Arima (0,1,1) models = Arima (data, p,< Span class= "Hljs-number" >1,q)). Fit () Model.summary2 ()  #生成一份模型报告model. Forecast (5)  #为未来5天进行预测, return forecast results, standard error, and confidence interval     
The longer the period of forecasting with the model forward, the greater the error of prediction, which is the typical characteristic of time prediction.

?

Time series mode (ARIMA)---python implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.