Machine learning--Probability map model (learning: incomplete data)

Source: Internet
Author: User

1. Overview

The learning problem of PGM is actually the inference of parameters. For a given data, system parameters need to be required to perfect the CPD of the system. However, in some cases, the data set of PGM may not be complete. Incomplete data sets can be divided into two situations: 1, data acquisition is affected, 2, the use of hidden variables. Data acquisition may be affected in two cases, the first of which is irrelevant to the data being collected, such as a coin toss, resulting in a coin not being found. The second is related to the data being collected, for example, when sending and receiving data, the high level is always blank. Therefore, before using incomplete data to train the model, it is necessary to judge the cause of the incomplete. If you model both of these scenarios, you have the following models:

Among them, Theta is the system parameter, Y is the data set, O is the observation variable (o=1, observation;).

If the system parameters are not Apriori, the system parameters should be inferred using the likelihood function. For example, in the case of a simple CPD, the likelihood of incomplete data with the complete data is shown in the following example:

Obviously in the case of data integrity, there is no coupling between the parameters, we can solve the theta_x and theta_y respectively. It is well understood that once the data is observed, the trait between the parameters is blocked. But if the data is incomplete, then theta_x and theta_y are coupled and cannot be optimized separately. Because var_x is not fully observed, there is a pathway between theta_y and theta_x. The analysis of the likelihood function shows that there are multiple peaks. The identification of system parameters is more difficult.

2. Parameter identification based on likelihood function optimization

There are multiple peaks in the likelihood function, and there are local optimal values. But what can you do if you want to get a set of meaningful parameters? Only the maximum value of the likelihood function can be obtained. The bottom line is that the set of parameters makes the data most likely to occur. Therefore, the optimization method can be used to solve the likelihood function. The simplest-gradient descent method.

Its gradient analytic solution is as follows:

The core is the calculation p (xi,ui|d[m],theta). This probability can be obtained by the calibration cluster tree.

The advantage of the optimization algorithm is that it is flexible and can be used for all CPD, including non-table type CPD. The disadvantage is that the algorithm is conditional optimization and needs to ensure that all parameters are not contradictory. And the cluster tree is re-calibrated every time the parameter is updated.

3. Maximum expectation algorithm

The idea of the maximal expectation algorithm is actually very simple, we get the data goal is to speculate the system parameter, but the data is not complete, if we have the system parameter, can estimate the data completely. So this is a cyclic process, and it turns out that the iterative iteration of the loop can make the final parameter estimate approximate to the real system parameters.

In E-step, it is assumed that the system parameters are known to Theta, and in the case of a given theta, the probabilities are obtained for all possible combinations x,u. Complete data is the complete probability, and incomplete data is the probability of its marginal missing variable. In M-step, the system parameter theta is updated with sufficient statistics.

For example, in the Bayesian classifier, we only have data and no class value for the data. (It really can be lost .....) At this point, if the EM algorithm is used, the Bayesian classifier changes from supervised learning classifiers to unsupervised learning clusters.

The advantage of the EM algorithm is that it converges quickly and requires sufficient statistics for each step. The disadvantage is that the convergence rate decreases when approaching convergence.

  

  

Machine learning--Probability map model (learning: incomplete data)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.