See http://blog.csdn.net/acdreamers/article/details/27365941 in the original

**Logistic regression** is a probabilistic nonlinear regression model, which is a study of the relationship between two classification observation and some influencing factors.

Variable analysis method. The usual problem is to study whether a certain outcome occurs in some factors, such as in medicine, according to some of the patient's symptoms.

Whether they have a certain disease.

Before explaining the **logistic regression** theory, we start with the LR classifier. The LR classifier, which is the **Logistic Regression Classifier. **

In the classification case, the learned LR classifier is a set of weights, and when the data input of the test sample, the weight value and the test data are

According to linear addition and get

Here is a characteristic of each sample.

After that, the form of the **sigmoid function** is calculated as

Since the **sigmoid function** is defined as domain, the most basic LR classifier is suitable for classifying two kinds of targets.

So the key problem of **logistic regression** is to study how to obtain this set of weights. This problem is done with **maximum likelihood estimation** .

The logistic regression model is formally in the following terms.

Consider a vector with an independent variable, and the conditional rate is based on the observed amount relative to the occurrence of an event

Probability. Then the **logistic regression** model can be expressed as

This is called the **logistic function. **which

Then the probability of not occurring under the condition is

So the ratio of events to the probability of not occurring is

This ratio is called the occurrence ratio of the event (the odds of experiencing an event), précis-writers is odds.

Take logarithm of odds to get

You can see that logistic regression is carried out around a logistic function. Next, we will talk about how to use the **maximum likelihood estimation** to find the parameters of the classifier.

Assuming that there is an observation sample, the observed values are set to the probability obtained under given conditions, in the same way,

The probability is that, so the probability of getting an observed value is.

Because each observation sample is independent of each other, their joint distribution is the product of each edge distribution. Get the likelihood function as

Then our goal is to find out the maximum parameter estimation of the likelihood function, and the maximum likelihood estimate is to find the parameter, which makes

Gets the maximum value, the logarithm of the function is obtained

Continue to the deviation of the guide, to obtain an equation, for example, now the parameters of the biased guide, because

So get

There is a total of such equations, so now the problem is transformed into a solution to the equations formed by the equation.

The above equation is more complex, the general method does not seem to solve it, so we quoted **Newton-Lafite iterative** method to solve.