LOGISTC regression Exercise (iii)

Source: Internet
Author: User
Tags sin

% Exercise 4--Logistic regressionclear all; Close all; CLCX = Load (' E:\workstation\data\ex4x.dat '); y = Load (' E:\workstation\data\ex4y.dat '); [m, n] = size (x);% Add intercept term to XX = [Ones (M, 1), X]; % Plot The training data% use different markers for positives and negatives classifications are calculated separately for each of the specific is allowed or not allowed Figurepos = find (y); Neg = Find (y = = 0);%find is a vector found, the result is a number plot of the value of the Find function bracket value is True (X (POS, 2), X (pos,3), ' + ') hold Onplot (x (neg, 2), X (Neg, 3), ' O ') Hold Onxlabel (' Exam 1 score ') Ylabel (' Exam 2 score ')% Initialize fitting Parameterstheta = Zeros (n+1, 1);% Define the Sigmo ID functiong = inline (' 1.0./(1.0 + exp (-Z)); % Newton ' s methodmax_itr = 7;    J = Zeros (max_itr, 1); for i = 1:max_itr% Calculate the hypothesis function z = x * theta;    h = g (z);% converted to logistic function% Calculate gradient and Hessian.    % The formulas below is equivalent to the summation formulas% given in the lecture videos. Grad = (1/m). *x ' * (h-y);% gradient vector notation H = (1/m). *x ' * DIAG (h) * DIAG (1-h) * X;%hessian momentVector representation of the array% Calculate J (for testing Convergence) J (i) = (1/m) *sum (-y.*log (h)-(1-y). *log (1-h));% loss function vector notation theta = theta-h\grad;% is such a child?  end% Display theta% Calculate The probability that a student with% score in exam 1 and score on exam 2 Admittedprob = 1-g ([1, 80]*theta)% draw out sub-interface% Plot Newton ' s method result% only need 2 points-Define a line, so Choos E Two endpointsplot_x = [min (X (:, 2))-2, Max (X (:, 2)) +2];% Calculate The decision boundary line,plot_y calculation Formula See blog comments below. Plot_y = ( -1./theta (3)). * (Theta (2). *plot_x +theta (1));p lot (plot_x, plot_y) Legend (' admitted ', ' not admitted ', ' Decision boundary ') hold off% Plot jfigureplot (0:max_itr-1, J, ' o--', ' Markerfacecolor ', ' r ', ' Markersize ', 8) xlabel (' Iteration '); Ylabel (' j ')% Display J

Results:

Logistic regression Exercises

The training sample given here is characterized by a score of two subjects for 80 students, a sample value of whether the corresponding classmate is allowed to go to university, if it is allowed to use ' 1 ', otherwise it is not allowed to use ' 0 ', this is a typical two classification problem. In this question, the 80 samples were given 40 of the positive and negative samples. And this section uses the logistic regression to solve, the result of the solution is actually a probability value, of course, by comparing with 0.5 can become a two classification problem.

Experimental basis:

In the logistic regression problem, the logistic function expression is as follows:

The advantage of this is that the output can be compressed between 0~1. The loss function in the logistic regression problem differs from the loss function in linear regression, which is defined as:

If the Newton method is used to solve the parameters in the regression equation, the iterative formula of the parameters is:

One of the first-order and Hessian matrix expressions is as follows:

Of course, in order to avoid the use of a For loop in programming, you should use the vector expressions of these formulas directly (see program contents for details).

  some matlab function:

  Find

is a vector that is found, and the result is a subscript number for the value of the Find function when the parentheses value is true.

  Inline

Constructs an inline function, much like the mathematical derivation formula we wrote on the draft paper. Parameters are usually used in single quotation marks, which is the expression of the function, if there are more than one argument, then separated by single quotation marks one by one description. For example: g = inline (' sin (alpha*x) ', ' x ', ' alpha '), then the two-tuple function is g (x,alpha) = sin (alpha*x).

Min Max minimum element in array

Specific steps:

1: Load Data and Paint:

x = Load (' E:\workstation\data\ex4x.dat ');

y = Load (' E:\workstation\data\ex4y.dat ');

[m, n] = size (x); Calculate data rows

% ADD intercept term to X

x = [Ones (M, 1), X]; Change the first column of data to 1

% Plot the training data

% use different markers for positives andfigure

pos = find (y); Neg = find (y ==0)% finds the corresponding allowed and disallowed draw marks respectively;

Plot (x (POS, 2), X (pos,3), ' + ')

On

Plot (x (neg, 2), X (Neg, 3), ' O ')

On

Xlabel (' Exam 1 score ')

Ylabel (' Exam 2 score ')

Newton ' s Method

Recall in logistic regression, the hypothesis function is


In our example, the hypothesis are interpreted as the probability that a driver would be accident-free, given the values of The features in X.

Matlab/octave does not has a library function for the sigmoid, so you'll have to define it yourself. The easiest-on-the-through an inline expression:

g = inline (' 1.0./(1.0 + exp (-Z));

% usage:to Find the value of the sigmoid

% evaluated at 2, call G (2)

The cost function is defined as


Our goal are to use Newton's method to minimize this function. Recall that the update rule for Newton ' s method is


In logistic regression, the gradient and the Hessian is



Note that the formulas presented above is the vectorized versions. Specifically, this means, while and is scalars.

Implementation

Now, implement Newton's Method in your program, starting with the initial value of. To determine how many iterations to use, calculate for each iteration and plot your results as you do in Exercise 2. As mentioned in the lecture videos, Newton's method often converges in 5-15 iterations. If you find yourself using far more iterations, you should check for errors in your implementation.

After convergence, use your values of theta to find the decision boundary in the classification problem. The decision boundary is defined as the line where


which corresponds to

% Initialize Fitting Parameters

theta = Zeros (n+1, 1); Initializes the θ value to 0.

% Define the sigmoid function

g = inline (' 1.0./(1.0 + exp (-Z)); Define a normalization function to make G inline as an inline function between "0,1"

% Newton ' s method

Max_itr = 7;

J = Zeros (max_itr, 1); Newton Law Concrete Procedure

For i = 1:max_itr

% Calculate the hypothesis function

z = x * theta; % there is a formula to define the value H = g (z);

h = g (z);

Grad = (1/m). *x ' * (h-y); Gradient Descent

h = (1/m). *x ' * DIAG (h) * DIAG (1-h) * x;% Newton ring Vector representation

J (i) = (1/m) *sum (-y.*log (h)-(1-y). *log (1-h)); function vector representation at any time

theta = Theta-h\grad; Find θ so that you can draw a straight line y=θ0+θ1x curve so that the specific can be allowed

End

% Display Theta

3: Below to find the sub-interface as far as possible on the same side and then can be based on the sub-interface to predict whether it is allowed,

% Calculate The probability that a student with

% score Exam 1 and score on exam 2

% won't is admitted

Prob = 1-g ([1, 80]*theta)% test the data for "1,20,80" this person

% Plot Newton ' s method result

% need 2 points to define a line, so choose, endpoints

plot_x = [min (X (:, 2))-2, Max (X (:, 2)) +2]; The second column is the semester maximum minimum value +-2 is the original line more extended,

The value of direct logistic regression is 0.5, then the exponent of E can be 0, namely:
Theta (1) *1+theta (2) *plot_x+theta (3) *plot_y=0, the plot_y can be solved.

Plot_y = ( -1./theta (3)). * (Theta (2). *plot_x +theta (1));

Plot (plot_x, plot_y)

Legend (' admitted ', ' not admitted ', ' decision boundary ')

Hold off

% Plot J The curve between the loss function value and the number of iterations:

Figure

Plot (0:max_itr-1, J, ' o--', ' Markerfacecolor ', ' r ', ' Markersize ', 8)

Xlabel (' iteration '); Ylabel (' J ')

% Display J

LOGISTC regression Exercise (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.