LOGISTC regression Exercise (iii)

Last Update:2015-06-27 Source: Internet

Author: User

Tags sin

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

% Exercise 4--Logistic regressionclear all; Close all; CLCX = Load (' E:\workstation\data\ex4x.dat '); y = Load (' E:\workstation\data\ex4y.dat '); [m, n] = size (x);% Add intercept term to XX = [Ones (M, 1), X]; % Plot The training data% use different markers for positives and negatives classifications are calculated separately for each of the specific is allowed or not allowed Figurepos = find (y); Neg = Find (y = = 0);%find is a vector found, the result is a number plot of the value of the Find function bracket value is True (X (POS, 2), X (pos,3), ' + ') hold Onplot (x (neg, 2), X (Neg, 3), ' O ') Hold Onxlabel (' Exam 1 score ') Ylabel (' Exam 2 score ')% Initialize fitting Parameterstheta = Zeros (n+1, 1);% Define the Sigmo ID functiong = inline (' 1.0./(1.0 + exp (-Z)); % Newton ' s methodmax_itr = 7;    J = Zeros (max_itr, 1); for i = 1:max_itr% Calculate the hypothesis function z = x * theta;    h = g (z);% converted to logistic function% Calculate gradient and Hessian.    % The formulas below is equivalent to the summation formulas% given in the lecture videos. Grad = (1/m). *x ' * (h-y);% gradient vector notation H = (1/m). *x ' * DIAG (h) * DIAG (1-h) * X;%hessian momentVector representation of the array% Calculate J (for testing Convergence) J (i) = (1/m) *sum (-y.*log (h)-(1-y). *log (1-h));% loss function vector notation theta = theta-h\grad;% is such a child?  end% Display theta% Calculate The probability that a student with% score in exam 1 and score on exam 2 Admittedprob = 1-g ([1, 80]*theta)% draw out sub-interface% Plot Newton ' s method result% only need 2 points-Define a line, so Choos E Two endpointsplot_x = [min (X (:, 2))-2, Max (X (:, 2)) +2];% Calculate The decision boundary line,plot_y calculation Formula See blog comments below. Plot_y = ( -1./theta (3)). * (Theta (2). *plot_x +theta (1));p lot (plot_x, plot_y) Legend (' admitted ', ' not admitted ', ' Decision boundary ') hold off% Plot jfigureplot (0:max_itr-1, J, ' o--', ' Markerfacecolor ', ' r ', ' Markersize ', 8) xlabel (' Iteration '); Ylabel (' j ')% Display J

Results:

Logistic regression Exercises

The training sample given here is characterized by a score of two subjects for 80 students, a sample value of whether the corresponding classmate is allowed to go to university, if it is allowed to use ' 1 ', otherwise it is not allowed to use ' 0 ', this is a typical two classification problem. In this question, the 80 samples were given 40 of the positive and negative samples. And this section uses the logistic regression to solve, the result of the solution is actually a probability value, of course, by comparing with 0.5 can become a two classification problem.

Experimental basis:

In the logistic regression problem, the logistic function expression is as follows:

The advantage of this is that the output can be compressed between 0~1. The loss function in the logistic regression problem differs from the loss function in linear regression, which is defined as:

If the Newton method is used to solve the parameters in the regression equation, the iterative formula of the parameters is:

One of the first-order and Hessian matrix expressions is as follows:

Of course, in order to avoid the use of a For loop in programming, you should use the vector expressions of these formulas directly (see program contents for details).

　　some matlab function:

　　Find

is a vector that is found, and the result is a subscript number for the value of the Find function when the parentheses value is true.

　　Inline

Constructs an inline function, much like the mathematical derivation formula we wrote on the draft paper. Parameters are usually used in single quotation marks, which is the expression of the function, if there are more than one argument, then separated by single quotation marks one by one description. For example: g = inline (' sin (alpha*x) ', ' x ', ' alpha '), then the two-tuple function is g (x,alpha) = sin (alpha*x).

Min Max minimum element in array

Specific steps:

1: Load Data and Paint:

x = Load (' E:\workstation\data\ex4x.dat ');

y = Load (' E:\workstation\data\ex4y.dat ');

[m, n] = size (x); Calculate data rows

% ADD intercept term to X

x = [Ones (M, 1), X]; Change the first column of data to 1

% Plot the training data

% use different markers for positives andfigure

pos = find (y); Neg = find (y ==0)% finds the corresponding allowed and disallowed draw marks respectively;

Plot (x (POS, 2), X (pos,3), ' + ')

Plot (x (neg, 2), X (Neg, 3), ' O ')

Xlabel (' Exam 1 score ')

Ylabel (' Exam 2 score ')

Newton ' s Method

Recall in logistic regression, the hypothesis function is

In our example, the hypothesis are interpreted as the probability that a driver would be accident-free, given the values of The features in X.

Matlab/octave does not has a library function for the sigmoid, so you'll have to define it yourself. The easiest-on-the-through an inline expression:

g = inline (' 1.0./(1.0 + exp (-Z));

% usage:to Find the value of the sigmoid

% evaluated at 2, call G (2)

The cost function is defined as

Our goal are to use Newton's method to minimize this function. Recall that the update rule for Newton ' s method is

In logistic regression, the gradient and the Hessian is

Note that the formulas presented above is the vectorized versions. Specifically, this means, while and is scalars.

Implementation

Now, implement Newton's Method in your program, starting with the initial value of. To determine how many iterations to use, calculate for each iteration and plot your results as you do in Exercise 2. As mentioned in the lecture videos, Newton's method often converges in 5-15 iterations. If you find yourself using far more iterations, you should check for errors in your implementation.

After convergence, use your values of theta to find the decision boundary in the classification problem. The decision boundary is defined as the line where

which corresponds to

% Initialize Fitting Parameters

theta = Zeros (n+1, 1); Initializes the θ value to 0.

% Define the sigmoid function

g = inline (' 1.0./(1.0 + exp (-Z)); Define a normalization function to make G inline as an inline function between "0,1"

% Newton ' s method

Max_itr = 7;

J = Zeros (max_itr, 1); Newton Law Concrete Procedure

For i = 1:max_itr

% Calculate the hypothesis function

z = x * theta; % there is a formula to define the value H = g (z);

h = g (z);

Grad = (1/m). *x ' * (h-y); Gradient Descent

h = (1/m). *x ' * DIAG (h) * DIAG (1-h) * x;% Newton ring Vector representation

J (i) = (1/m) *sum (-y.*log (h)-(1-y). *log (1-h)); function vector representation at any time

theta = Theta-h\grad; Find θ so that you can draw a straight line y=θ0+θ1x curve so that the specific can be allowed

End

% Display Theta

3: Below to find the sub-interface as far as possible on the same side and then can be based on the sub-interface to predict whether it is allowed,

% Calculate The probability that a student with

% score Exam 1 and score on exam 2

% won't is admitted

Prob = 1-g ([1, 80]*theta)% test the data for "1,20,80" this person

% Plot Newton ' s method result

% need 2 points to define a line, so choose, endpoints

plot_x = [min (X (:, 2))-2, Max (X (:, 2)) +2]; The second column is the semester maximum minimum value +-2 is the original line more extended,

The value of direct logistic regression is 0.5, then the exponent of E can be 0, namely:
Theta (1) *1+theta (2) *plot_x+theta (3) *plot_y=0, the plot_y can be solved.

Plot_y = ( -1./theta (3)). * (Theta (2). *plot_x +theta (1));

Plot (plot_x, plot_y)

Legend (' admitted ', ' not admitted ', ' decision boundary ')

Hold off

% Plot J The curve between the loss function value and the number of iterations:

Figure

Plot (0:max_itr-1, J, ' o--', ' Markerfacecolor ', ' r ', ' Markersize ', 8)

Xlabel (' iteration '); Ylabel (' J ')

% Display J

LOGISTC regression Exercise (iii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More