Wunda machinelearning Week6 Sixth Week summary of knowledge points
The
- should split the data into three parts of the training set (training set)/cross-validation set (validation set)/test set. The
training set is used to train the data, the validation set is used to determine the parameter dimensions selected by the model, whether it is over-fitted, etc., and the test set is used for final test model effects. The smaller the parameter dimensions of the
- model, the easier it is to fit, which is reflected in the cross-validation set error (validation error), which can be large but may cause overfitting.
The training set error (train error) will become larger as the parameter dimension increases, but the generalization will be better, and cross-validation will reduce the error
and the final two error values will become closer and more convergent.
- There are generally several
- more training sets for over-fitting or high-error resolutions--solving overfitting
- Fewer parameter dimensions--solving overfitting
- more parameter dimensions--Resolving high errors /li>
- Increase lambda--resolve over fit
- Decrease lambda--Resolve high error
- There may be some cases of skewed data (skewed data). If the cancer incidence rate is 0.5% if the predictive model predicts no cancer for all patients
then the model can also have a 99.5 correct rate. This is obviously not appropriate. The following quantities are introduced
- Precision = True positive/(true positive + false positive)
- Recall = true positive/ (True positive + false Negatvie)
- Fscore = 2 * (P * r)/(P + r)
Where true positive says when actually getting sick, predict illness. False positive that the actual disease is not, the prognosis is worth the illness.
False negative indicates that the actual illness, the prognosis is not ill. True negative indicates that the disease is not in practice and is not expected to be ill.
The higher the Precision, the higher the predictive accuracy, the higher the probability of predicting the disease, but this results in a low Recall value, which may cause missed diagnosis.
The disease is not predicted to be ill. Finally, a Fscore value is used to evaluate the higher the predictive model value, the better.
After-school job code LINEARREGCOSTFUNCTION.M
function [J, Grad] = Linearregcostfunction (X, y, theta, Lambda)%linearregcostfunction Compute cost and gradient for regularized linear%regression with multiple variables% [J, Grad] = Linearregcostfunction (X, y, theta, Lambda) computes the% cost of using Theta as the parameter for linear regression to fit the% data points in X and Y. Returns , the cost of J and the gradient in grad% Initialize Some useful valuesm = Length (y);% Number of training examples% need to return the following variables correctlyJ =0; grad = zeros (size (theta));% ====================== YOUR CODE here ======================% Instructions:compute the cost and gradient of regularized linear% regression for a particular choice of theta.%% should set J to the cost and grad to the gradient.%THETA_WITHOUT1 = Theta (2: End,:); J = SUM ((X * theta-y). ^2) / (2* m) + sum (lambda * theta_without1. ^)2/(2* m)); theta_without1 = THETA;THETA_WITHOUT1 (1) =0; grad = X ' * (x * theta-y)/m + lambda * theta_without1/m;% =========================================================================Grad = Grad (:); end
Learningcurve.m
function [Error_train, error_val] = ... learningcurve (X, y, Xval, yval, Lambda)%learningcurve generates the train and cross validation set errors needed%to plot a learning curve% [Error_train, error_val] = ...% Learningcurve (X, y, Xval, yval, lambda) returns the train and% Cross validation set errors for a learning curve. In particular,% It returns vectors of the same length-error_train and% Error_val. Then, Error_train (i) contains the training error for% i examples (and similarly for Error_val (i)).%% in this function, you'll compute the train and test errors forThe % dataset sizes from 1 to M. In practice, when working with larger% datasets, you might want to does this in larger intervals.%% Number of training examplesm = Size (X,1);% need to return these values correctlyError_train = zeros (M,1); error_val = zeros (M,1);% ====================== YOUR CODE here ======================% Instructions:fill in this function to return training errors in% Error_train and the cross validation errors in Error_val.% i.e., Error_train (i) and% Error_val (i) should give you the errors% obtained after training on I examples.%For i =1: M theta = Trainlinearreg (X (1: I,:), Y (1: i), lambda); Error_train (i) = Linearregcostfunction (X (1: I,:), Y (1: i), Theta,0); Error_val (i) = Linearregcostfunction (Xval, Yval, Theta,0); End% -------------------------------------------------------------% =========================================================================End
Polyfeatures.m
function [X_poly] = Polyfeatures (X, p)%polyfeatures Maps X (1D vector) into the p-th power% [X_poly] = Polyfeatures (X, p) takes a data matrix X (size M X 1) and% maps Each example to its polynomial features where% X_poly (i,:) = [x (i) x (i). ^2 x (i). ^3 ... X (i). ^p];%% need to return the following variables correctly.X_poly = Zeros (Numel (X), p);% ====================== YOUR CODE here ======================% Instructions:given A vector X, return a matrix x_poly where the p-th% column of x contains the values of x to the p-th power.%%m = Numel (X); X1 = X (:);d ISP (X1); For i =1:p for j =1: M X_poly (j,i) = X1 (j) ^i; EndEnd% =========================================================================End
Validationcurve.m
function [Lambda_vec, Error_train, error_val] = ... validationcurve (X, y, Xval, Yval)%validationcurve Generate The train and validation errors needed to%plot A validation curve that we can use to select lambda% [Lambda_vec, Error_train, error_val] = ...% Validationcurve (X, y, Xval, Yval) returns the train% and validation errors (in Error_train, Error_val)% for different values of lambda. You are given the training set (X,% y) and validation set (Xval, Yval).%% Selected values of lambda (you should don't change this)Lambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 Ten]‘;% need to return these variables correctly.Error_train = Zeros (Length (Lambda_vec),1); error_val = zeros (Length (Lambda_vec),1);% ====================== YOUR CODE here ======================% Instructions:fill in this function to return training errors in% Error_train and the validation errors in Error_val. the% Vector Lambda_vec contains the different lambda parameters% to calculation of the errors, i.e,% Error_train (i), and error_val (i) should give% errors obtained after training with% lambda = Lambda_vec (i)%For i =1: Length (Lambda_vec) lambda = Lambda_vec (i); theta = Trainlinearreg (X, y, Lambda); Error_train (i) = Linearregcostfunction (X, y, Theta,0); Error_val (i) = Linearregcostfunction (Xval, Yval, Theta,0); end% =========================================================================End
Wunda machinelearning Week6