Regression
1 ) Multivariate linear regression
(1 ) model creation
Multivariate linear regression is a discussion of the variable y and non-random variable x1 ... the relationship between XM, assuming they have a linear relationship, then there are models:
Y =b0 + b1x1 + ... + bmxm+ E
Here's e~n(0,a2),B0, ...,bn,A2 are all unknown. The upper matrix expression is:
Y =xb + E
for a set of samples (x00 ... x0m,y0) ... (xn0 ... Xnm,yn), at which point each observed value is:
Yi = b0 + b1xi1 + ... + bmxim+ ei (i=0, ... ) N
(2 ) Model solving
The model solving process is to complete the estimation of the unknown parameters of the regression model, often using the least squares method, that is, to find the estimate of B b~ meet the following conditions:
σni=1 (yi-σmj=0xijb~j) 2 = minσni=1 (yi-σmj=0xijbj) 2
or write a matrix expression:
|| y-xb~| | 2= min| | y-xb| | 2
The estimates for the solution B are:
b~= (xTx) -1xty
the estimates for A2 are:
A~2= (1/(N-m-1)) [Yty-b~t (Xty)]
(3 ) regression coefficient and the significance test of regression equation
1. the significance of regression coefficient test
The significance test of the so-called regression coefficient is the test hypothesis H0:bj=0<-->h1:bj!=0 (J=1, ...,m). under the conditions of the establishment of H0, there are:
tj= b~j/(cjja~) ^ (a) ~ T (n-m-1)
which CJJ is the first J+1 element on the main diagonal of the c= (xTx) 1 . For a given significance level, the value of Tj is computed and the table is judged to accept or reject H0.
2. the significance of regression equation test
the significance test of regression equation is about H0:b0=b1= ... =BM=0<-->H1: There is at least one bj!=0 (J=1, ...,m) test problem. Established in H0 are:
F = (QB (n-m-1))/(QAm) ~ F (M, n-m-1)
which Qa=σni=1 (yi-y~i) 2 for the remainder of the sum of squares,qb=σni=1 (y~i-(1/n)σni=1yi) 2 for the regression of the sum of squares. For a given level of significance , the value of F is calculated and the table is judged to accept or reject H0.
(4 ) Selection of the optimal regression equation
The general principle of the selection of the optimal regression equation is to seek a linear regression equation which contains all the regression variables that have significant effect on Y, and rejects the non-significant regression variables, which is optimal for estimating the standard error a~ minimum. The following methods are generally used.
1. Poor Lifting Method
for the possible combination of all regression variables, the linear regression equation about Y is obtained, and the best is chosen from it.
2."Just can't go out" method
This method is based on experience, select a regression variable, and then introduce other regression variables, the advantage is that the computation is small, the disadvantage is that the optimal equation may be omitted.
3."Only out of the way" method
This method is to first introduce all the variables, and then eliminate one by one, select the estimated standard error a~ Minimum, the advantage is that the calculation is small, the disadvantage is that the optimal equation may be omitted.
4."In and Out", stepwise regression method
The basic idea of this method is that for all regression variables, according to the degree of its influence on Y, and the numerical size of Tj statistic, from large to small successive introduction to linear regression equation, after not introducing a regression variable, all the regression coefficients are tested, Once a non-significant regression variable is found, it is removed, so that it goes back and forth until the new argument cannot be entered. This method is much less computationally efficient than the exhaustive method, and does not omit the optimal equation than the "just-in" and "out-of-the-way" methods.
(5 ) Robust regression
when we fit the linear regression model in the least squares method, we assume that ei(i=1,,,,, n) is an independent and distributed positive random variable, and the good properties of parameter estimation are discussed under these assumptions, but in objective reality, it is difficult to fully satisfy the hypothesis. Due to the existence of the above problems, it is very important to make the fitting result from the least square method very different from the actual model, so it is very natural to put forward: can we construct a parameter estimation approach, when the actual model and the theoretical model are small, the performance change is also small, not sensitive to the assumptions, such methods are called robust methods. The following is a brief introduction to some M estimation methods.
M Estimation is the abbreviation of the maximum likelihood estimate. Assuming that Ei(i=1,,,,, n) is independent of the same distribution, then the linear regression model
Y = Xb +
The M estimate of the parameter b b~ is given by the following formula
σni=1beta(yi-σmj=0xijb~j) 2= minσni=1beta(yi-Σmj=0xijbj) 2
Or
σni=1π(yi-σmj=0xijb~j) xik= 0, k=0, 1,..., M.
here β and π are suitably selected real numbers, generally β is a symmetric convex function, or a non-descending even function on the positive half axis, and π is a bounded singular function, and if β is a convex function, and pi=β, then the above two definitions are equivalent. The parametric solution in the above equation can only be solved by iterative method.
(6 ) forecast
With the regression equation, the results can be obtained by directly taking the test data into the regression equation.
Pros:easy to interpret results,computationally inexpensive
Cons:poorly Models Nonlinear data
Works with:numeric values, nominal values
2 ) based on the regression of the tree-shaped structure ( CART )
in the previous blog, a simple introduction of CART used for classification, the splitting process is similar to the general decision tree algorithm, but the choice of Split attribute standard is Gini standard. Here is a brief introduction to the CART used for regression (the target attribute is continuous).
(1)CART regression tree construction
1. Target Forecasts
Assumptions (x1,y1) , (X2,y2) ,. ....., (XC,YC) are all examplesof leaf node L, the class label of L can be recorded as:
y~ = (1/c) Σci=0 Yi
The sample average of the dependent variable in the leaf node.
2. Division Criteria
The Division criteria selection can make the tree error squared sum s minimum attribute,s expression is as follows:
S = σc∈leaves (T)Σi∈C (YI-MC) 2
where mc = (1/NC) Σi∈c yi( The predicted value of leaf node C).
3. algorithm steps for basic regression tree construction
① starts with the node that contains all the samples and computes the MC and S.
② Select to make S reduce the most attributes to split. If the reduction of S is less than the threshold δ, or the node sample number is less than Q, or the condition attribute has the same value, stop. (First pruning)
③ Otherwise, step ① is performed on the new node.
4. continuous attribute splitting algorithm
sorting the attribute column for a continuous split and culling the repeated occurrences, then selecting the average of the adjacent two values as the binary threshold, splitting the attribute column and calculating the corresponding s value, and then select the threshold that divides the attribute 's minimum binary threshold value as the attribute column to be split. Repeat the above steps for all successive attribute columns, and then select the attribute corresponding to the S min value as the split property of the current node. This attribute splitting algorithm is not optimal and there are many other improved algorithms, which are not mentioned here.
5. cross-fit problem
in step 3 , we have actually given a first pruning algorithm, the disadvantage is obvious, the selection of threshold is a problem. A better pruning algorithm is a post-pruning algorithm (its disadvantage is that it is computationally large), with the minimum expected cost of miscalculation (ECM) and the minimum description length (DML) algorithm. A post-pruning algorithm is described below, which determines whether to merge leaf nodes based on the test data and the error size:
Split the test data for the given tree:
If The Eithersplit is a tree:call prune on that split
Calculate theerror associated with merging leaf nodes
Calculate Theerror without merging
If mergingresults in lower error then merge the leaf nodes
(2 ) Model tree
if the leaf node in the regression tree is replaced with piecewise linear function, then it becomes the model tree , and in the leaf node we use a linear regression model to fit part of the leaf node data, which is used to predict the data reaching the leaf node, thus avoiding the miscalculation caused by the mean value.
Pros:fits Complex, nonlinear data
Cons:difficult to interpret results
Works with:numeric values, nominal values
This article from "Remys" blog, declined reprint!
Regression of machine learning algorithm review