Generative Learning algorithms

最後更新：2016-01-02 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

"generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal."

discriminative：

試圖找到class之間的差異，進而找到decision boundary,最大可能性地區分資料。他是通過直接學習到$p(y|x)$(例如Logistic regress)或者$X \rightarrow Y\in (0,1,...,k)$(例如perceptron algrithm)

generative：

採取另外一種方式，首先由先驗知識prori-knowledge得到 $p(x|y),p(y)$ 然後，通過Bayes rule：$p(y|x) = \frac{p(x|y)p(y)}{p(x)} $來求得$p(y|x)$,其中$p(x)=p(x|y=1)p(y=1)+p(x|y=0)p(y=0)$。這個過程可以看做由先驗分布去derive後驗分布。當然，在只需要判斷出可能性大小的情況下，分母無需考慮，即：$$\arg\max_yp(y|x) = \arg \max_y\frac{p(x|y)p(y)}{p(x)}\\=\arg\max_yp(x|y)p(y)$$

先驗知識擷取$p(x|y)和p(y)$的方式，是通過現有訓練資料樣本獲得參數的過程。
1. 首先假設一個模型，即樣本分布的模型（是伯努利還是高斯分布）
2. 然後通過似然估計likelihood function估計出參數
3. 最後通過貝葉斯公式匯出$p(y|x)$

example

資料集:$X=(x_1,x_2)$,$Y\in{0,1}$

首先我們假設資料的條件分布$p(x|y)$服從多元高斯常態分佈（multivariate normal distribution）,則model形式如下：$$y\sim \textrm{Bernoulli}(\phi) \\ x|y=0 \sim \mathcal{N}(\mu_0,\Sigma) \\ x|y = 1\sim \mathcal{N}(\mu_1,\Sigma )$$
接著通過最大似然估計（max likelihood estimate）估計參數。首先寫出log似然函數：$$\ell(\phi,\mu_0,\mu_1,\Sigma) = log\prod_{i=1}^{m}p(x^{(i)},y^{(i)},\mu_0,\mu_1,\Sigma) \\ =log\prod_{i=1}^mp(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)p(y^{(i)};\phi).$$
然後似然函數$\ell$最大化,即求解似然函數對參數導數為零的點：$$\phi=\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=1\} \\ \mu_0= \frac{\sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}} {\sum_{i=1}^m1\{y^{(i)}=0\}} \\ \mu_1= \frac{\sum_{i=1}^{m}1\{y^{(i)}=1\}x^{(i)}} {\sum_{i=1}^m1\{y^{(i)}=1\}} \\ \Sigma = \frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T$$得到參數的估計值$(\phi,\mu_0,\mu_1,\Sigma)$，亦即得到分布函數$p(x|y)$。對照上面的圖，$\mu_0,\mu_1$是兩個二維向量，在圖中的位置是兩個常態分佈各自的中心點，$\Sigma$則決定者多元常態分佈的形狀。
![此處輸入圖片的描述][2]
從這一步可以看出擷取參數的方式是“學習”得到的，即從大量樣本-先驗知識去估計模型，這樣想是很自然的邏輯.然而嚴格的依據卻是大數定律law of large numbers (LLN)，大數定律的證明很精彩，可自行尋找資料。
通過貝葉斯公式比較$p(y=1|x)$和$p(y=0|x)$,來判別類屬性。

Generative Learning algorithms

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Generative Learning algorithms

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support