Statistical learning Method Hangyuan Li---6th chapter logistic regression and maximum entropy model

Source: Internet
Author: User

6th Chapter Logistic regression and maximum entropyModelLogistic regression (regression) is a classical classification method in statistical learning. Max Entropy isone criterion of probabilistic model learning is to generalize it to the classification problem to get the maximumEntropymodel (maximum entropymodel). Logistic regression model and maximumEntropymodels belong to the logarithmic linear model. 6.1 Logistic regression model definition 6.1 (logical distribution): set X is a continuous random variable, x obeys logistic distribution refers toX has the following distribution and density functionsIn the formula, U is the positional parameter and the r>0 is the shape parameter. A graph of the density function f (x) and the distribution function f (x) of the logical distribution.Distribution Lettera number belongs to a logical function whose shape is an S-shaped curve (sigmoid curve). the curve takes a point (U, a)is the center symmetry, namely satisfies
The curve grows faster near the center, and the lower the value of the shape parameter y at both ends, the smaller the curvaturethe faster the line grows near the center.
Two logistic regression models (binomial logistic regression model)is a classification model for two categories of classification.by the conditional probability distribution P (y| X) indicates that the form is a parameterized logical distribution. Here, the random variable xThe value is a real number, and the random variable y is 1 or 0. Definition 6.2 (Logistic regression model): The two-item logistic regression model is the following conditional probabilityDistribution:
W is called the weighted value to, B is called bias, w.x is the inner product of W and X. expands the weight vector and input vector to w= (W, b), x = (x,1), the logistic regression model is as follows

The probability of an event (odds) means that the event is sentand the probability of the event not occurring,if the probability of an event occurring is P, then the eventthe logarithm probability (log odds) or Logit function is
In the case of logistic regression,
This means that in a logistic regression model,the logarithmic probability of the output Y=1 is the model represented by the linear function of the input x.

Model parameter Estimation

the maximum likelihood estimation method can be applied to estimate the model parameters, and the logarithmic likelihood function is as follows:
In this way, the problem becomes the optimization problem with the logarithmic likelihood function as the objective function. the gradient descent method and quasi-Newton method are often used in logical regression learning.
Multiple Logistic regression models (multi-nominal logistic regression model) for multi-class classification with the following model:
the parameter estimation method of two logistic regression can also be generalized to multiple logistic regression. 6.2 Maximum entropy model the maximum entropy model (Maxunum entropy models) is derived from the largest entropy principle . MaximumEntropy principleis a criterion of probabilistic model learning. MostLarge entropythe principle is that when learning a probabilistic model,in all possible probability models (distributions),EntropyThe largest model is the best model. Usually with a constraint barto determine the set of probability models, so the maximumEntropyThe principle can also be expressed as a model that satisfies the constraint condition.selected in the Type collectionEntropythe largest model. When distributed evenly, the entropy is the most. MaximumEntropyprinciplethe probability model that is to be selected must first meet aboutbeam conditions. In the absence of more information, those uncertainties are "likely". MaximumEntropyprinciple throughEntropyto indicate the likelihood of such a maximum. " may not be easy to operate, whileEntropyis aa numerical indicator that can be optimized.

Maximum Entropy definition of the model

Given the training data set, you can determine the joint distribution P (x, y)Experience distributions of the empirical and marginal distributions P (X),
where V (x=x,y=y) represents the frequency at which samples (x, y) appear in the training data, V (× = x) indicates trainingthe frequency at which X appears in the data, and N indicates the training sample capacity.     with the Feature function (featurefunction) f (x, y)describes one thing between input x and output yThe . It is defined as
the characteristic function f (x, Y) of the expected value of the empirical distribution p~ (x, y) is expressed in ep~ (f):feature function f (x, y) about model P (y| x) The expected value of p~ (x) with the empirical distribution, expressed in EP (f),
Constraints are

definition 6 . 3 (maximum entropy model): assume that a collection of models that satisfies all constraints is
defined in the conditional probability distribution P (y| X) on the condition entropy is
The largest model of conditional entropy H (P) in model set C is called the maximum entropy model .

Maximum Entropy Learning about Models

MaximumEntropyThe learning process of the model is to solve the maximumEntropymodel of the process, you canformalization for constrained optimization problems: Translates to
The original problem of constrained optimization is converted to the duality problem of unconstrained optimization. by askingsolution duality problem seekingOriginalThe solution starts with the problem.

BiggestEntropyin Model learningof even function is equivalent to the maximumEntropythe maximum likelihood estimation of the model,MaximumEntropyThe learning problem of the model is transformed into a specific solution to the maximum or dual logarithm likelihood function .the problem of maximum function. The logarithmic likelihood function is: The target function is:MaximumEntropythe general form of the model is:

6.3 The optimization algorithm for model learning is based on an improved iterative scale method (Improved iterative scaling, IIS)the maximum entropy model learning algorithm the idea of IIS is to assume that the maximum entropy model of the current parameter vector is w= (W1, ..., WN) T,want to find a new parameter vectorW + sigmal = (W1+sigmal1,..., WN+SigmalN)T, which makes the logarithm of the modelThe likelihood function value increases. If we can have such a method of parameter vector update:w-->W + Sigma,then itThis method can be reused until the maximum value of the logarithmic likelihood function is found.
Maximum entropy model learning algorithm based on quasi-Newton method (BFGS)




From for notes (Wiz)

Statistical learning Method Hangyuan Li---6th chapter logistic regression and maximum entropy model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.