Multivariate Adaptive Regression splines (marsplines)

Source: Internet
Author: User
Tags truncated

    • Introductory overview
      • Regression problems
      • Multivariate Adaptive Regression splines
      • Model Selection and pruning
      • Applications
    • Technical notes:the marsplines algorithm
    • Technical notes:the Marsplines Model
Introductory overview

multivariate Adaptive Regression splines (Marsplines) is an implementation of Techniques popularized by Friedman (1991) for solving regression-type problems (see also, multiple regression), with the M Ain purpose to predict the values of a continuous dependent or outcome variable from a set of independent or predictor Var Iables. There is a large number of methods available for fitting models to continuous variables, such as a linear regression [e.g ., multiple Regression , General Linear Model (GLM) ], nonlinear Regression (generalized linear/nonlinear Models), regression trees (see Classification and regression trees ), CHAID, Neura L Networks, etc.   (see also Hastie, Tibshirani, and Friedman, 2001, for a overview).

Marsplines is a nonparametric regression procedure this makes no assumption about the underlying functional Relationshi P between the dependent and independent variables. Instead, Marsplines constructs this relation from a set of coefficients and basis functions that is entirely "driven" fro M the regression data. In a sense, the method was based on the ' divide and conquer ' strategy, which partitions the input space into regions, each With its own regression equation. This makes marsplines particularly suitable for problems with higher input dimensions (i.e., with more than 2 variables), Where the curse of dimensionality would likely create problems for other techniques.

The marsplines technique have become particularly popular in the area of data mining because it does not assume or impose a NY particular type or class of relationship (e.g., linear, logistic, etc.) between the Predictor variables and the Depende NT (outcome) variable of interest. Instead, useful models (i.e., models that yield accurate predictions) can is derived even in situations where the relation Ship between the predictors and the dependent variables are non-monotone and difficult to approximate with parametric model S. For more information about this technique and how it compares to other methods for nonlinear regression (or regression Trees), see Hastie, Tibshirani, and Friedman (2001).

Regression problems

Regression problems is used to determine the relationship between a set of dependent variables (also called output, Outco Me, or response variables) and one or more independent variables (also known as input or predictor variables). The dependent variable is the one whose values you want to predict, based on the values of the Independent (predictor) var Iables. For instance, one might is interested in the number of car accidents on the roads, which can is caused by 1) bad weather A nd 2) Drunk driving. In the case one might write, for example,

Number_of_accidents = Some Constant + 0.5*bad_weather + 2.0*drunk_driving

The variableNumber of accidents is the dependent variable which is thought to being caused by (among other variables)Bad Weather andDrunk Driving (hence the name dependent variable). Note that the independent variables is multiplied by factors, i.e.,0.5 and 2.0. These is known as regression coefficients. The larger these coefficients, the stronger the influence of the independent variables on the dependent variable. If the both predictors in this simple (fictitious) example were measured on the same scale (e.g., if the variables were STA Ndardized to a mean of 0.0 and standard deviation drunk Driving could being inferred to contribute 4 times + to car accidents T Han bad Weather. (If the variables is not measured on the same scale, then direct comparisons between these coefficients is not meaningfu L, and, usually, some other standardized measure of predictor "importance" are included in the results.)  

For additional details regarding these types of statistical models, refer to multiple Regression or general Linear Models (GLM), as well as General Regression Models (GRM). In general, the social and natural sciences regression procedures is widely used in the. Regression allows the researcher to ask (and hopefully answer) the general question ' What's the best predictor of ... ' For example, educational researchers might want to learn what is the best predictors of success in High-school is. Psychologists want to determine which personality variable best predicts social adjustment. Sociologists want to find out which of the multiple social indicators best predict whether a new immigrant group would Adapt and is absorbed into society.

Multivariate Adaptive Regression splines

The car accident example we considered previously is a typical application for linear regression, where the response VA Riable is hypothesized-depend linearly on the predictor variables. Linear regression also falls into the category of so-called parametric regression, which assumes that the nature of the RE Lationships (but not the specific parameters) between the dependent and independent variables is known a Priori (e.g., is linear). By contrast, nonparametric regression (see nonparametrics) does do any such assumption as to how the dependent varia Bles is related to the predictors. Instead It allows the regression function to being "driven" directly from data.

Multivariate Adaptive Regression splines is a nonparametric Regression procedure so makes no assumption about the Underl Ying functional relationship between the dependent and independent variables. Instead, Marsplines constructs this relation from a set of coefficients and so-called basis functions that is entirely de Termined from the regression data. You can think of the general ' mechanism ' by which the Marsplines algorithm operates as multiple piecewise linear Regressio N (see nonlinear estimation), where each breakpoint (estimated from the data) defines the ' region of application ' for A particular (very simple) linear regression equation.

Basis functions. Specifically, Marsplines uses two-sided truncated functions of the form (as shown below) as basis functions for linear or Nonlinear expansion, which approximates the relationships between the response and predictor variables.

Shown above is a simple example of the basis functions (T-x) + and (X-T) + (adapted from Hastie, et al., 2001, figure 9.9). Parameter T is the knot of the basis functions (defining the "pieces" of the piecewise linear regression); these knots (parameters) is also determined from the data. The "+" signs next to the terms (t-x) and (x-t) simply denote this only positive results of the respective equatio NS is considered; Otherwise the respective functions evaluate to zero. This can also is seen in the illustration.

The marsplines model. The basis functions together with the model parameters (estimated via least squares estimation) is combined to produce th E predictions given the inputs. The General Marsplines model equation (see Hastie et al., 2001, Equation 9.19) is given as:

Where the summation is over theM nonconstant terms in the model (further details regarding the model is also provided in Technical Notes). To summarize,Y is predicted as a function of the Predictor variablesX (and their interactions); This function consists of a intercept parameter () and the weighted (by) sum of one or more basis functions, of the Kin D Illustrated earlier. You can also think of this model as "selecting" a weighted sum of basis functions from the set of (a large number of) Basi s functions that span all values of each predictor (i.e., that set would consist of one basis function, and parametert, for each distinct the value for each predictor variable). The marsplines algorithm then searches through the space of all inputs and predictor values (knot locations T) as well as int Eractions between variables. During This search, an increasingly larger number of basis functions is added to the model (selected from the set of poss Ible basis functions), to maximize an overall least squares goodness-of-fit criterion. As a result of these operations, Marsplines automatically determines the most important independent variables as well as T He most significant interactions among them. The details of this algorithm is further described in Technical Notes, as well as in Hastie et al, 2001).

categorical predictors. In practice, both continuous and categorical predictors could is used, and would often yield useful results. However, the basic Marsplines algorithm assumes that the predictor variables is continuous in nature, and, for example, T He computed knots program would usually not coincide with actual class codes found in the categorical predictors. For a detailed discussion of categorical predictor variables in Marsplines, see Friedman (1993).

multiple dependent (outcome) variables. The marsplines algorithm can be applied to multiple dependent (outcome) variables. In this case, the algorithm would determine a common set of basis functions in the predictors, but estimate different Coeff Icients for each dependent variable. This method of treating multiple outcome variables are not unlike some neural networks architectures, where multiple outcom E variables can predicted from common neurons and hidden layers; In the case of marsplines, multiple outcome variables is predicted from common basis functions, with different Coefficien Ts.

Marsplines and classification problems. Because Marsplines can handle multiple dependent variables, it's easy-to-apply the algorithm to classification problems a s well. First, code the classes in the categorical response variable into multiple indicator variables (e.g., 1 = observation belo NGS to Class K, 0 = observation does not belong to Class K); Then apply the marsplines algorithm to fit a model, and compute predicted (continuous) values or scores; Finally, for prediction, assign, the class for which the highest score are predicted (see also Hastie, Tibshira Ni, and Freedman, 2001, for a description of this procedure). Note that the this type of application would yield heuristic classifications that could work very well in practice, but isn't BA SED on a statistical model for deriving classification probabilities.

Model Selection and pruning

In general, nonparametric models is adaptive and can exhibit a high degree of flexibility, the may ultimately result I n overfitting if no measures is taken to counteract it. Although such models can achieve zero error on training data, they has the tendency to perform poorly when presented with New observations or instances (i.e., they do not generalize well to the prediction of "new" cases). Marsplines, like most methods of this kind, tend to overfit the data as well. To combat this problem, Marsplines uses a pruning technique (similar to pruning in classification trees) to limit the comp Lexity of the model by reducing the number of its basis functions.

Marsplines as a Predictor (feature) selection method. This feature-the selection of and pruning of basis functions-makes This method a very powerful tool for predictor Sele Ction. The marsplines algorithm would pick up only those basis functions (and those predictor variables) so make a "sizeable" co Ntribution to the prediction (refer to Technical Notes for details).

Applications

Multivariate Adaptive Regression splines has become very popular recently for finding predictive models for "difficult" D ATA mining problems, i.e., when the Predictor variables does not exhibit simple and/or monotone relationships to the Depende NT variable of interest. Alternative models or approaches that's can consider for such cases is CHAID,Classification and Regression Trees, or any of the many neural Networks architectures available. Because of the specific manner in which Marsplines selects predictors (Basis functions) for the model, it does generally " Well "in situations where regression-tree models is also appropriate, i.e., where hierarchically organized successive SPL Its on the Predictor variables yield good (accurate) predictions. In fact, instead of considering this technique as a generalization of multiple regression (as it is presented in this int roduction), consider Marsplines as a generalization of regression trees, where the "hard" binary splits is Replac ed by "smooth" basis functions. Refer to Hastie, Tibshirani, and Friedman (2001) for additional details.

To index
Technical notes:the marsplines algorithm

Implementing marsplines involves a and step procedure that's applied successively until a desired model is found. In the first step, we build the model, i.e. increase it complexity by adding basis functions until a preset (user-defined ) Maximum level of complexity has been reached. Then we begin a backward procedure to remove the least significant basis functions from the model, i.e. those whose Remova L'll leads to the least reduction in the (least-squares) goodness of fit. This algorithm is implemented as follows:

    1. Start with the simplest model involving only the constant basis function.

    2. Search the space of basis functions, for each variable and for all possible knots, and add those which maximize a certain Measure of goodness of fit (minimize prediction error).

    3. Step 2 is recursively applied until a model of pre-determined maximum complexity is derived.

    4. Finally, the last stage, a pruning procedure was applied where those basis functions are removed that contribute least t o The overall (least squares) goodness of fit.

To index
Technical notes:the multivariate Adaptive Regression splines (marsplines) Model

The Marsplines algorithm builds models from the sided truncated functions of the predictors (x) of the form:

These serve as basis functions for linear or nonlinear expansion, approximates some true underlying function f (x).

The Marsplines model for a dependent (outcome) variable y, and M terms, can is summarized in the following Equati On:

Where the summation is over the "terms in the" model, and b o andb M are parameters of the model (along with the knots T-each basis function, which is also estimated from the data). Function H is defined as:

where XV (K,M) is the predictor in the K ' th of the M ' th product. For order of Interactions K=1, the model was additive and for k=2 the model pairwise Interactive.

During forward stepwise, a number of basis functions is added to the model according to a pre-determined maximum which sh Ould is considerably larger (twice as much at least) than the optimal (best least-squares fit).

After implementing the forward stepwise selection of basis functions, a backward procedure was applied in which the model I s pruned by removing those basis functions that is associated with the smallest increase in the (least squares) goodness- Of-fit. A Least squares Error function (inverse of goodness-of-fit) is computed. The so-called generalized cross Validation error was a measure of the goodness of fit that takes into account not only the Residual error also the model complexity as well. It's given by

With

Where N is the number of cases in the data set, D are the effective degrees of freedom, which is equal to the Numbe R of independent basis functions. The quantity C is the penalty for adding a basis function. Experiments has shown the best value for C can is found somewhere in the range 2 < D < 3 (see Hastie E T al., 2001).

Original address: Link

T. Hastie, R. Tibshirani, and J. Friedman, the elements of statistical learning:data Mining, inference, and prediction, 2 nd ed. Berlin, Germany:springer-verlagt, 1998.

Multivariate Adaptive Regression splines (marsplines)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.