Time series prediction (data use passengers.csv, algorithm with Arima) _ Artificial Intelligence

Source: Internet
Author: User
Tags statsmodels

To view data structures:

Import pandas as PD
data = pd.read_csv ('/users/liailan/tmp/airpassengers.csv ')
print (Data.head ())
   unnamed:0     Time  value
0           1  1949-01    112
1           2  1949-02    118
2           3  1949-03    132
3           4  1949-04    129
4           5  1949-05    121

Output:

   unnamed:0     Time  value
0           1  1949-01    112
1           2  1949-02    118
2           3  1949-03    132
3           4  1949-04    129
4           5  1949-05    121

Load data:

Dateparse = Lambda x:pd.datetime.strptime (x, '%y-%m ')
data = pd.read_csv ('/users/liailan/tmp/airpassengers.csv ' , parse_dates=[' time '],date_parser=dateparse]
data = Data.set_index (' time ')

Determine if the sequence is stable:

Intuitive observation:

Import Matplotlib.pylab as Plt
%matplotlib inline
plt.plot (data.value)

As you can see, this sequence has obvious trends and seasonality, then first you need to convert the sequence into a stable sequence, there are commonly used differential methods, here in another, the data is decomposed into trend sequences, seasonal and residual sequences

Import NumPy as np
Ts_log = Np.log (data[' value ') from
statsmodels.tsa.seasonal import seasonal_decompose
decomposition = Seasonal_decompose (ts_log,freq=12)
trend = Decomposition.trend #趋势
seasonal = Decomposition.seasonal  #季节性
residual = decomposition.resid     #残差序列
Residual.dropna (inplace=true)

Determine the stability of the residual sequence:

Import statsmodels.tsa.stattools as ts
dftest = ts.adfuller (residual)
dfoutput = PD. Series (dftest[0:4],index=[' Test statistic ', ' p-value ', ' #Lags Used ', ' Number of obserfvisions Used '])
for key, Value in Dftest[4].items ():
dfoutput[' Critical value (%s) '%key] = value
print (dfoutput)
Test statistic                 -6.332387e+00
p-value                         2.885059e-08
#Lags Used                      9.000000e+00
# of Obserfvisions Used    1.220000e+02
Critical value (1%)            -3.485122e+00
Critical value (5%)            - 2.885538e+00
Critical Value (10%)           -2.579569e+00
Dtype:float64

You can see P-value is a value far less than 0.05, you can think that the sequence is stable, and then use the Arima method to predict (Arima method has three core parameters, the specific meaning and determine the parameters of the method to find the relevant articles of Arima)

From Statsmodels.tsa.arima_model import Arima
Model_arima = Arima (residual, (2,0,2)). Fit (disp=-1, method= ' CSS ')
Predictions_arima = model_arima.predict (start= ' 1950-01 ', end= ' 1962-04 ')
plt.plot (residual)
Plt.plot (Predictions_arima)

Output:

It looks like the results are not good, and then converts back to the original data space:

Predictions_arima = Predictions_arima.add (trend,fill_value=0). Add (seasonal,fill_value=0)
Predictions_ARIMA = Np.exp (Predictions_arima)
plt.plot (data[' value '],color= ' Blue ')
plt.plot (predictions_arima,color= ' red ')

Look at the picture:

Can be seen in the middle of the match can also, and the beginning and the final prediction results are more abnormal, in fact, because of trend,seasonal data missing

Look at trend:

Print (trend)

Results:

Time 1949-01-01 nan 1949-02-01 nan 1949-03-01 nan 1949-04-01 nan 1949-05-01 nan 19 49-06-01 NaN 1949-07-01 4.837280 1949-08-01 4.841114 1949-09-01 4.846596 1949-10-01 4.851238 1949-11-0 1 4.854488 1949-12-01 4.859954 1950-01-01 4.869840 1950-02-01 4.881389 1950-03-01 4.893411 1950-04-01 4.
904293 1950-05-01 4.912752 1950-06-01 4.923701 1950-07-01 4.940483 1950-08-01 4.957406 1950-09-01 4.974380 1950-10-01 4.991942 1950-11-01 5.013095 1950-12-01 5.033804 1951-01-01 5.047776 1951-02-01 5.060902 1951-03   
-01 5.073812 1951-04-01 5.088378 1951-05-01 5.106906 1951-06-01 5.124312 ... 1958-07-01 5.932964 1958-08-01 5.938377 1958-09-01 5.946188 1958-10-01 5.956352 1958-11-01 5.967813 1958-12    -01 5.977291 1959-01-01 5.985269 1959-02-01 5.994078 1959-03-01 6.003991 1959-04-01 6.014899 1959-05-01 6.026589 1959-06-01 6.040709 1959-07-01 6.054492 1959-08-01 6.066195 1959-09-01 6.073088 1959-10-01 6.080733 1959-11-01 6.091930 1959-12-01 6.102013 1960-01-01 6.112511 1960-02-01 6.121153 1960-03-01 6.128381 1960-04-01 6.137437 1960-05-01 6.1457 6.151526 1960-07-01 nan 1960-08-01 nan 1960-09-01 nan 1960-10-01 1960-06-01 nan 1960 -11-01 nan 1960-12-01 nan name:value, length:144, Dtype:float64

Seasonal data:

Print (seasonal)

Results:

Time 1949-01-01-0.085815 1949-02-01-0.114413 1949-03-01 0.018113 1949-04-01-0.013046 1949-05-01-0.008966 19 49-06-01 0.115393 1949-07-01 0.210816 1949-08-01 0.204512 1949-09-01 0.064836 1949-10-01-0.075271 1949-11-0 1-0.215846 1949-12-01-0.100315 1950-01-01-0.085815 1950-02-01-0.114413 1950-03-01 0.018113 1950-04-01-0.
013046 1950-05-01-0.008966 1950-06-01 0.115393 1950-07-01 0.210816 1950-08-01 0.204512 1950-09-01 0.064836 1950-10-01-0.075271 1950-11-01-0.215846 1950-12-01-0.100315 1951-01-01-0.085815 1951-02-01-0.114413 1951-03   
-01 0.018113 1951-04-01-0.013046 1951-05-01-0.008966 1951-06-01 0.115393 ... 1958-07-01 0.210816 1958-08-01 0.204512 1958-09-01 0.064836 1958-10-01-0.075271 1958-11-01-0.215846 1958-12 -01-0.100315 1959-01-01-0.085815 1959-02-01-0.114413 1959-03-01 0.018113 1959-04-01-0.013046 1959-05-01- 0.008966 1959-06-01 0.115393 1959-07-01 0.210816 1959-08-01 0.204512 1959-09-01 0.064836 1959-10-01-0.075271 1959-11-01-0.215846 1959-12-01 -0.100315 1960-01-01-0.085815 1960-02-01-0.114413 1960-03-01 0.018113 1960-04-01-0.013046 1960-05-01-0.0089 66 1960-06-01 0.115393 1960-07-01 0.210816 1960-08-01 0.204512 1960-09-01 0.064836 1960-10-01-0.075271 1960 -11-01-0.215846 1960-12-01-0.100315 Name:value, length:144, Dtype:float64

As you can see, trend has no data after 1960-06, and seasonal is over after 1960-12, so it will result in the final data being 0 after 1960-06.

Let's look at the shape of the trend:

Plt.plot (Trend)

Approximate linearity, we can use linear fitting to predict the data after 1960-06

Trend.dropna (inplace=true) from
sklearn.linear_model import linearregression
x = pd. Series (Range (trend.size), index=trend.index)
x = X.to_frame ()
LinReg = linearregression () linereg
= Linreg.fit (x, Trend)
x = PD. Series (Range (0,154), index= (Pd.period_range (' 1949-07 ', periods=154,freq = ' M ')))
x = X.to_frame ()
Res_ predict = linereg.predict (x)
trend2 = PD. Series (Res_predict,index=x.index). To_timestamp () plt.plot (trend,color= '

blue ')
plt.plot (trend2,color= ' Red ')

Fitting Result:


Next, extend the value of seasonal to 1962-04.

Since the seasonal is cyclical, use shift directly.

index1 = Pd.period_range (' 1949-01 ', periods=160,freq = ' M ')
index1 = Index1.to_datetime ()
seasonal= Seasonal.reindex (index1)
seasonal = Seasonal.shift
plt.plot (seasonal)

,

Use the new Trend2 and seasonal to return to the original data

Model_arima = ARIMA (residual, (2,0,2)). Fit (disp=-1, method= ' CSS ')
Predictions_arima = Model_arima.predict (start= ' 1950-01 ', end= ' 1962-04 ')
Predictions_arima = Predictions_arima.add (trend2,fill_value=0). Add (seasonal,fill_value=0)
Predictions_ARIMA = Np.exp (Predictions_arima)
plt.plot (data[' value '],color= ' Blue ')
plt.plot (predictions_arima,color= ' red ')

Look at the results, we can make it.

The whole process is a process of predicting using Arima and documenting


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.