Sequence Prediction Using ODM and OLAP (1)

Last Update:2018-12-05 Source: Internet

Author: User

Tags svm stock prices

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Part 1 Overview PrefaceThe ODM component in the http://oracledmt.blogspot.com/2006/01/time-series-forecasting-part-1_23.html Oracle Database supports timing prediction. Prediction is supported by the forecast command in OLAP. The forecast command can predict data in three ways: Straight-line trend, exponential growth, and Holt-winter exponential smoothing ). Forecast performs computation based on the selected method, and you can also choose to store the results in variables in analytic workspace. The first two methods are relatively simple inference methods. The Holt-Winters prediction method is more complex. It is a kind of exponential smoothing or moving average technology. The Holt-Winters method consists of three statistical correlation sequences for actual prediction. These sequences are: 1. Smooth data sequence, original data with seasonal Results 2. Seasonal index sequence, seasonal results for each period. 3. Trend sequence. The forecast command supports the following methods: "univariate time series )". They can only be used to model a time series. A time series consists of observed sequence single variable (scalar) data that spans an increased time range. The forecast command method is linear and cannot capture the complex relationship between input and output. ODM provides powerful non-linear technical support for time series prediction of complex relationships and other variables through its own support vector machine (SVM) regression functions. The rest of this article includes the data mining method for time series modeling. This article is part of a series. In the next article, I will provide an example and method of using ODM for time series prediction. Data mining methodsOdm svm regression supports time series modeling by using the time delay or lag space love your method. This method is also called "state-space reconstruction (in the physical community state-space reconstruction)" and "Tapped Delay Line (in the engineering community )". In its simplest form, the previous values of the target (that is, the content of the time series to be predicted) are used as model inputs known as "lagged variabled )", in addition, the SQL Lag Analysis Function is easy to calculate. Other attributes related to the prediction sequence can also be added in the same way. For example, we need to predict the daily power load based on the power load value and the average daily temperature. The following table describes the values calculated using the lag function. Data uses Y to indicate the maximum load value x to indicate the average temperature for 10 days.

Day	Y	Lag (Y, 1)	Lag (Y, 2)	X	Lag (x, 1)	Lag (x, 2)
1	797	.	.	-7.6	.	.
2	777	797	.	-6.3	-7.6	.
3	797	777	797	-3.0	-6.3	-7.6
4	757	797	777	0.7	-3.0	-6.3
5	707	757	797	-1.9	0.7	-3.0
6	730	707	757	-6.0	-1.9	0.7
7	818	730	707	-6.2	-6.0	-1.9
8	818	818	730	-3.9	-6.2	-6.0
9	803	818	818	-6.3	-3.9	-6.2
10	804	803	818	-1.1	-6.3	-3.9

In some cases, the Secondary attribute X, such as the average temperature given in the above example, is not obtained when we try to predict the target y, so it cannot be included in the input. However, we can still use the lag value of X. Once the attribute is selected, we can use these target y to train the SVM model and prediction attributes. In the preceding example, the pre-testers are: lag (Y, 1), lag (Y, 2 ), x, lag (x, 1), lag (x, 2 ). Data is divided into training and test datasets. Generally, training is performed on previous dates and tests are performed on future dates. For one-step prediction tests, training and test data can be randomly selected among all available data. The SVM regression model that only inputs the lagged target y is called "autoregressive model )". The input space that includes all the lagged variables is called "embedded space )". If there is no equal interval between data rows in the time series, it will be a little troublesome. The time interval to be observed is different. One way is to use the Smoothing Technique to calculate the value of the same time interval attribute, and then use the calculated replacement value instead of the original value for training. MethodWhen modeling a time series, you must make the following decisions based on the above method: 1. trend removal 2. target Transformation 3. lagged attribute selection is required for most time series prediction technologies. L trend moving trend removal the key fact about the above time delay method is that the time sequence is unchanged. It means that the time series at a series of time intervals are worth the same statistical distribution. In fact, this means that there is no trend in the time series. In fact, many time series show a trend. For example, in many financial indexes, stock prices often increase over a period of time. The trend composition in the time series refers to the tendency of the sequence value to rise or fall over a period of time. The simplest method is differencing, which is a standard statistical method for processing stochastic trends. In the above example, instead of using Y (Time Series value) as the target, instead of using differential d = Y-LAG (Y, 1) as the target, the same applies to the target lagged value. For example, lag (Y, 1) is used instead of lag (Y, 1)-lag (Y, 2) as the prediction maker ). Sometimes it is necessary to calculate the difference. The difference of the target can reverse obtain the prediction of the original sequence. L target transformation normal SVM regression target helps improve algorithm convergence. For timing problems, the target should be normalized in advance to create the lagged variable. L select lagged attribute selection for the delayed attribute. You can select lag by analyzing data (computing the correlogram or cross-correlation graph cross-correlogram) or selecting the window size. For example, if we set the window size to 2, it should include lag (Y, 1) and lag (Y, 2) as the prediction, and Y is the target attribute. When selecting the window size, note that the window size directly affects the SVM algorithm's pattern recognition capability, which limits the pattern size that can be recognized. If the window is too small, we may not have enough information to capture the dynamic time series data of the system. If the window is too large, the extra lagged attribute will increase noise and make the problem difficult to solve. Computing and PredictionThere are several different ways to calculate prediction. The two most common strategies are: One-Step Prediction one-step-ahead (open-loop) multi-Step Prediction (closed-loop ). L single-step prediction or open-loop prediction requires that all input values of the model be available. The earlier target values contained in the model can only be predicted for the next interval. For the prediction request in the previous example, we can only predict one day (day 11) in the request. To calculate y_12 and predict day12, we need to wait for the actual day11 to happen. Predict day 11 = P (y_10, y_9), day12 = P (y_11, y_10), and so on. One-step prediction can be achieved directly through ORACLE data miner's apply and test mining activities or using SQL prediction functions. The second part of this series will be discussed.

Forecast y_11 as P (y_10, y_9)
Forecast y_12 as P (y_11, y_10)
And so on

L multi-step prediction or closed loop prediction: this policy uses the actual value when the actual value is available, and uses the estimated value or predicted value as the input when the actual value is unavailable.

Forecast y_11 as p_11 = P (y_10, y_9)
Forecast y_12 as p_12 = P (p_11, y_10) p_11 is the predicted value.
And so on

In this example, the predicted y_12 value can be generated even if there is no actual value of y_11. This policy uses the predicted value for estimation. Multi-step prediction is calculated using simple PL/SQL processes. In the third part of this article. Comparison with traditional time series technologyLike most neural networks, SVM regression provides time series prediction, but training is simplified. Advantages of using this model: l ability to model very complex functions L ability to use a large number of variables in the model and include other data in extra delayed time series data

Use the method described here to create a non-linear auto-regression model. These models are very useful and widely used in many fields, such as financial prediction, power load prediction, chaotic chaos model, and Sunday prediction. Compared with Arima, a popular time series prediction technology supports both Auto-regression and moving average. However, Arima modeling is linear, but SVM regression models can capture non-linear relationships.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More