R language Mixing time prediction better time series point estimation

Source: Internet
Author: User
Tags mixed

Hybrid prediction-The average of single-model predictions-is typically used to produce better point estimates than any contribution prediction model. I showed how to build a prediction interval for mixed predictions, where the coverage is more accurate than the most commonly used prediction interval (i.e. 80% of the actual observations are indeed within the 80% confidence interval), and the test contest data set is performed in the 3,003 M3 predictions.

Forecast interval

The forecaster's problem is the prediction interval used in the forecast mix. The prediction interval is a concept that is similar but not identical to the confidence interval. The prediction interval is an estimate of the value (or, more precisely, the range of possible values) that is not yet known but will be observed at some point in the future. The confidence interval is an estimate of the range of possible values for essentially non-observable parameters. The prediction interval takes into account the uncertainties in the model, the uncertain estimation of the parameters in the model (i.e., the confidence interval of those parameters), and the randomness of the individual associated with the specific point of the prediction.

For example, a study found that the prediction interval was calculated to include real results 95% of the time only between 71% and 87% of the time to get it (thanks to Hyndman again on his blog easy to get the results). There are many reasons, but the main reason is that uncertainty in model building and selection process is not fully considered. Most methods for developing a prediction interval actually first estimate a series of values, provided the model is correct. Since our model is simply a simplification of reality, we will fail more often if the model is completely correct.

At a glance, I could not find a discussion of how to generate a prediction interval from a predictive combination. Hyndman avoid referencing the issue in the first article related to the above, and only make a statement about point estimates "if you only want to point predictions, then (mean ets and auto.arima ) is forecast the best method available in the package. ”

Predictions make me nervous, and in my daily work, when the reality is significantly different from the prediction point, it can cause problems. I want to focus more on the predicted range of predictions, I hope this range is accurate, or whether it's a bit conservative (for example, I'd be happier with 83% of the observations into the inside my 80% prediction interval than 77%).

Introduction HYBRIDF ()

I like to combine auto.arima() and ets() quickly and efficiently mix predictions, which can be as good as a single variable series. In fact, it's a thing that can easily be done thousands of times in a day, so to make it more convenient, I created a hybridf() function for me to do this in R and generate the class object forecast . This means that I can fit predictions using a single line of code, and Hyndman other features developed for that class, such as standard predictions, can be used for the generated objects. This is the practical application of the monthly accidental Death Time series from 1973 to 1978 in the United States for the 1991-year time series of Brockwell and Davis : Theory and Method , and one of the built-in datasets for R.

# development version needed sorry

The dark gray area is the 80% prediction interval, and the light gray area is the 95% prediction interval. The top panel displays mixed predictions. The dark blue line is just the average of the point predictions for the other two methods, and the prediction interval uses a conservative view to show the values of the widest range of combinations. Therefore, the blend prediction interval will be wider than the prediction interval for any contribution model.

Test M3 Contest Series

In these days of easy access to computers and data, we don't have to just know the success rate of different prediction intervals, we can test the method based on the actual data. I used the 3,003 M3 competitive data set to compare the resulting 80% and 95% prediction intervals ets() , auto.arima() and I hybridf() . After fitting the model to the given historical data and generating the desired length predictions, I calculated how many actual results were in the prediction interval. The results are as follows:

variables Success
Ets_p80 0.75
Ets_p95 0.90
Auto.arima_p80 0.74
Auto.arima_p95 0.88
Hybrid_p80 0.83
Hybrid_p95 0.94

My hybrid approach has a success rate in close proximity to advertising, and these two predictions range ets() and auto.arima() are less successful. For example, the mixed 80% prediction interval contains 83% of the actual results, and 95% prediction intervals have 94% actual results; For Auto.arima, the success rate was 74% and 88%, respectively.

Here are the methods I tested on the M3 data. I built a small function pi_accuracy() to help, and it took advantage of the class prediction object to return a matrix named "Lower" and another matrix named "Upper", with one column for each prediction interval level. Since this is just a temporary feature of this blog, I keep it, so it applies only to the default values of Prediction objects that generate 80% and 95% intervals:

#------------------setup------------------------library(showtext)library(ggplot2)library(scales)library(forecast) ly = "myfont"))pi_accuracy <- function(fc, yobs){   # checks the success of prediction intervals of an object of class     In <- (yobsm   }

It is relatively easy to actually fit all predictions. My laptop took about one hours. When hybridf() a function returns an ets() auto.arima() object that provides the underlying and object, they do not need to be reinstalled and have some modest efficiency.

#============forecasting with default values===============# ie 3003results <- matrix(0, nrow = num_series, ncol = 7)for(i in 1:num_series){   cat(i, " ")        # let me know how it‘s going as it loops through...   series <- M3[[i]] ccess      fc1 <- fc3$fc_ets   r    geom_smooth(se = FALSE, method = "lm") +   theme(panel.grid.minor = element_blank())

When we look at the success rate of a single prediction, an interesting pattern appears, as shown in. A small number of unfortunate people have 0% of the actual data in the prediction interval-things go wrong and remain wrong. Typically, the longer the prediction period, the higher the accuracy of the prediction interval. The prediction intervals become wider because they predict more cycles; And the randomness explicitly contained in this interval begins to dominate the inaccuracy of the sunk cost of the first error model. For a longer forecast period, standard prediction intervals tend to be carried out in the manner advertised, and they are overly optimistic for shorter forecast periods.

Guide

The prediction methods are all ets() auto.arima() about having the option of simulating and guiding the residuals, rather than analyzing the estimated prediction intervals, which are inherited by me hybridf() . I also checked the values of these prediction intervals. The results are very similar to the non-guided results. If so, the prediction interval based on bootstrap and simulation is slightly inaccurate, but the difference is irrelevant.

variables Success
Ets_p80 0.72
Ets_p95 0.88
Auto.arima_p80 0.70
Auto.arima_p95 0.86
Hybrid_p80 0.80
Hybrid_p95 0.92
#=====with bootstrapping instead of formulae for the prediction intervals=============num_series <- length(M3)resultsb <- matrix(0, nrow = num_series, ncol = 7)for(i in 1:num_series){   cat(i, " ")      gather(variable, value, -h) %>%   mutate(weighted_val ighted_value) / sum(h), 2))
Conclusion
    • Based on the M3 race data hybridf() , the ets() actual to desired level is performed by combining the prediction interval and the auto.arima() predicted interval formed in a conservative manner ("the widest range covered by the overlay of two source intervals"), that is, the 80% forecast interval contains the true value in 80% of the time, The 95% forecast interval contains less than 95% of the time truth.

R language Mixing time prediction better time series point estimation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.