Ml&dllogistics Regression Understanding

Source: Internet
Author: User

Previously learned linear classification, linear regression and logistics regression, this time to do a summary, and the main derivation of the cross-entropy loss function and gradient descent method. I. Overview

A picture of heights field teacher's handout is first sacrificed

The difference between PLA, Linear regression to logistics regression.

The error function changes from 0/1 error to mean square error to cross entropy error. 1.1 Pla/pocket

PLA is for linear data, two classification, 0/1 error, initialization weight, and then iteration update, when there is a classification error point, the correct weight, Wt+1=wt+yn (t) *xn (t), until there is no error.

Later, in order to deal with non-linear data, the introduction of pocket, no longer looking for that no classification error weight, but in the iterative process to record each weight in the wrong number of times, after enough weight, to make the final weight as a result. 1.2 Linear regression

Linear regression can be used to solve the problem of predicting bank card quota, predicting the price of housing. Use the mean square error.

For the time being, don't do too much explaining.

Directly into the logistics regression. second, logistics regression 2.1 Basic Introduction

When we are predicting the recurrence of heart disease, it is impossible to give a yes or no answer, only to say, how many probabilities will recur. However, our training data is only recurrent or non-recurrent, and the training data we want to get is probabilistic.

This introduces the logistics function, which is converted to a number between 0-1 through a map to represent the probability.

Logistics function: F (x) =11+e−x f (x) = 1 1 + e−x f (x) = \frac{1}{1+e^{-x}}

Thus, the assumption function is obtained:

So how do we optimize this hypothetical function, and what kind of error function to use? The cross-entropy loss function is introduced here. 2.2 Derivation of the cross-entropy loss function

Suppose we have such a bunch of data,

The probability of our target function generating this data set is:

P (D) =p (x1o) p (x2x) P...P (XNX) p(d)=p(x1o)p(x2x)p...p(xnx)p(d)=p(x1o)p(x2x)p...p(x_nx)

In the formula, capital O is the positive category O, uppercase X is negative category X

Since we know that the known data X1 x 1 x_1, the possibility of producing o is our objective function f (x), so we can get:

The formula for conditional probabilities can be obtained by:

P (b| A) =p (AB) p (a) p (B | a) = P (a B) p (a) p (b| a) = \frac{P (AB)}{p (a)}

Therefore, the probability formula that produces the DataSet D can be expressed as:

P (D) =p (x1) F (x1) ∗p (x2) (1−f (x2)) ∗ ... ∗p (XN) (1−f (XN)) p (D) = P (x 1) F (x 1) ∗

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.