Stanford "Machine learning" Lesson4 sentiment-------2, generalized linear model

Source: Internet
Author: User


The Bernoulli and Gaussian distributions involved in the classification and regression problems in the previous few are the special cases of the generalized linear model (generative Linear models.glms). The generalized linear model is described in detail below.

1. Index Family

We can summarize some distributions into an exponential family. The exponential family can be expressed as:


η refers to naturalparameter/canonical parameter,t (y) refers to Sufficientstatistic, a (η) refers to the Logpartition function. The selection of T, A and B determines the distribution family, and the change of η will get the different distribution functions in this distribution family.

Both Bernoulli distribution and Gaussian distribution are examples of exponential family distributions. First the Bernoulli distribution can be expressed as follows:


Therefore, the following results can be obtained:


This indicates that the Bernoulli distribution can be expressed by selecting the appropriate T, a, and B in the form of an exponential family. The second Gaussian distribution can be expressed as:


Similarly, the following results can be obtained:


2. Constructing Generalized linear model

Generally for a problem to use the generalized linear model, we basically follow the following three assumptions.

(1) y | X Θ∼exponentialfamily (η). Based on the data, it is assumed that y obeys an exponential family distribution.

(2) Select a hypothetical function to meet H (x) =e[y|x]. According to this, we can predict the x corresponding to the Y value or to classify.

(3), if the η is a vector,

The steps to build a generalized linear model are familiar with least squares and logistic regression.

2.1 Least Squares

The least squares method is for continuous type numerical values. Y satisfies the Gaussian distribution. So according to hypothesis (1) can be μ=η. According to hypothesis (2) and hypothesis (3), it is possible to:


2.2 Logistic regression

The Bernoulli distribution is an exponential family distribution for the two-tuple classification problem. Y|x; Θ∼bernoulli (φ), from 1, the analysis of the exponential family can be known and according to hypothesis (2) (3) The following results can be obtained:


3, Softmax return

When the classification problem is no longer two yuan but K yuan, that is, y∈{1,2,..., k}. We can solve this classification problem by constructing the generalized linear model. The following steps are described.

Suppose y obeys exponential family distribution, φi = P (y = i;φ) and known. So. We also define.


Also 1{} The condition for the representation in parentheses is the true value of the entire equation is 1, otherwise 0. So (T (y)) i = 1{y = i}. From the knowledge of probability theory, e[(T (y)) i] = P (y = i) =φi. So we can get:


So


So


Define Ηk =log (φk/φk) = 0, so


So


So the Softmax function can be represented as follows:


According to hypothesis (3), and define ΘK = 0, you can get Softmax regression:


According to the hypothesis (2)


It is therefore known that the maximum likelihood probability is calculated as:


The next step is to determine the maximum likelihood probability, so as to determine the hypothetical function to finalize the classification results. Can be followed by the gradient ascending or Newton iterative method to be obtained.

This is the basic process of solving using generalized linear models. Determine the distribution of Y obedience, then determine the T, a, B, η, and then get the basic model of the hypothesis function, and then use the maximum likelihood law or other methods to obtain the closest parameter value, so as to get the most close to the real hypothesis function to solve the problem.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Stanford "Machine learning" Lesson4 sentiment-------2, generalized linear model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.