This series of articles allow reprint, reproduced please keep the full text!
"Total Catalog" http://www.cnblogs.com/tbcaaa8/p/4415055.html
1. Poisson regression (Poisson Regression)
In life, you often encounter a class of problems that need to model the number of occurrences of a small probability event over time, such as cancer, fire, etc.
Assuming that vector x represents the factor that causes this event, vector θ represents the weight of the factor, then Hθ(x) =exp (ΘTX) is used toindicate the expected number of occurrences of the event. The Θtx is positioned at an exponential position, meaning that each additional 1 units will double the expected number of times the event occurred.
At this point, the dependent variable and the independent variable approximate the Poisson distribution, namely: Y (i) ~π (hθ(x (i))).
the maximum likelihood estimation of the parameter θ is obtained below. Likelihood function:
Logarithmic likelihood function:
Define the loss function:
To make the likelihood function maximum, simply minimize the loss function. Use the minimum value of the loss function instead of the smallest value:
Simplification, there are:
Finally, iterative solution using gradient descent method:
Among them, for the learning rate.
2. Softmax regression (Softmax Regression)
Using the logistic regression model described earlier, we can already solve the two classification problem. Below, we promote the two classification problem to the K classification problem.
In logistic regression, the dependent variable y∈{0,1}, corresponding to two categories respectively, and in the Softmax regression model, the dependent variable y∈{1,2,..., k}, respectively, corresponding to the K classification. The Softmax regression assumes that the dependent variable obeys a polynomial distribution of φ1,..., φk, i.e. y (i) ~mult (φ1,..., φk). which
The parameter φk is redundant, using the sum of probabilities equal to 1, you can get:
Also defined:
It is easy to prove that φ has the following properties:
In particular, it is important to note that the above-mentioned nature still holds true for i=k, although the derivation process is not the same. These properties will be used directly in subsequent certifications.
The maximum likelihood estimator of the parameter θ is obtained below, and the likelihood function is:
where function 1{expression} is defined as follows: When expression is true, the value of the function is 1; otherwise 0. The nature of φ can be exploited by 1{·} Further simplification.
Logarithmic likelihood function:
Define the loss function:
To make the likelihood function maximum, simply minimize the loss function. Use the minimum value of the loss function instead of the smallest value:
The above can be further organized into vector form:
Finally, iterative solution using gradient descent method:
At this point, the series has explored four commonly used regression models, in which Poisson regression and Softmax regression are not easy to understand. The origins of hθ (x) and the reasons for the similarity of J (θ) in different models will be explained in subsequent articles.
Machine Learning Study Notes (3)--the regression problem in depth: Poisson regression and Softmax regression