Algorithm grocery store-Bayesian Network for classification algorithms (Bayesian Networks)

Source: Internet
Author: User

Algorithm grocery store-Bayesian Network for classification algorithms (Bayesian Networks)

By T2, 5977 visits,
Favorites,
Edit

2.1 Summary

In the previous article, we discussed Naive Bayes classification. Naive Bayes classification has a restriction that feature attributes must be conditional or basically independent (in fact, it is almost impossible to be completely independent in practical applications ). When this condition is set, Naive Bayes classifier has the highest accuracy, but unfortunately, in reality, each feature attribute is often not conditional independent, but highly correlated, this limits the capabilities of Naive Bayes classification. In this article, we will discuss the Bayesian Network (also known as Bayesian Belief Network or Belief Network), an advanced algorithm with a wider application scope in Bayesian classification ).

2.2 reconsider the previous example

In the previous article, we used Naive Bayes classification to detect unrealistic accounts in the SNS community. In that solution, I made the following assumptions:

I. The real account has a higher log density, a higher friend density, and more real portraits than the non-real account average.

Ii. log density, friend density, and whether to use real portraits are independent under the conditions specified by account authenticity.

However, the second assumption may not be true. In general, the friend density is related not only to whether the account is real, but also to whether there is a real avatar, because the real avatar will attract more people to add it as a friend. Therefore, to obtain a more accurate classification, we can modify the assumptions as follows:

I. The real account has a higher log density, a higher friend density, and more real portraits than the non-real account average.

Ii. log density and friend density, log density, and whether to use real portraits are independent under the conditions given by account authenticity.

Iii. Users who use real portraits have a higher average friend density than those who use non-real portraits.

The above assumptions are closer to the actual situation, but the problem also arises. Because of the dependency between feature attributes, Naive Bayes classification is not applicable. Now, I am looking for another solution.

Indicates the association between feature attributes:

It is a directed acyclic graph. Each node represents a random variable, and the arc represents the relationship between two random variables, indicating that the impact on the node is directed to the node. However, if only this graph is used, only the relationship between random variables can be defined. If quantitative analysis is required, some data is required, which is the conditional probability of each node to its direct precursor node, nodes without a precursor node are represented by a prior probability.

For example, the following table is obtained through statistics on the Training dataset (r indicates account authenticity and h indicates Avatar authenticity ):

The vertical header represents the condition variable, and the horizontal header represents the random variable. The table above shows the probability of real accounts and non-real accounts, while the table below shows the probability of authenticity of the Avatar to the account. The two tables are the conditional probability table of "account Authenticity" and "avatar Authenticity" respectively. With this data, we can not only infer in a forward direction, but also reverse inference through Bayesian theorem. For example, if an account is randomly selected and its profile picture is known to be false, the probability that the account is false is also:

That is to say, if you only know that the profile picture is false, there is about 35.7% of the probability that this account is false. If it is difficult to read the above derivation, review the conditional probability, Bayesian theorem, and full probability formula in probability theory. If a conditional probability table is provided for all nodes, statistical inference can be performed for any random variables when the observed values are incomplete. The preceding method uses Bayesian Networks.

2.3. Definition and nature of Bayesian Networks

With the above preparations, we can formally define Bayesian Networks.

A Bayesian network definition includes a directed acyclic graph (DAG) and a set of conditional probability tables. Each node in the Dag represents a random variable. variables can be directly observed or hidden, while directed edges represent conditional dependencies between random variables; each element in the conditional probability table corresponds to the unique node in the Dag, storing the joint conditional probability of this node for all its direct precursor nodes.

Bayesian Networks have an extremely important feature, that is, we assert that each node is independent of all its non-direct predecessor nodes after the value of its direct precursor node is determined.

This is similar to a Markov process. Actually, Bayesian networks can be seen as non-linear extensions of Markov chains. This feature makes it clear that Bayesian Networks can easily calculate the joint probability distribution. Generally, first, the probability distribution of the multi-variable non-independent Joint condition is calculated as follows:

In Bayesian Networks, due to the aforementioned nature, the joint conditional probability distribution of any random variable combination is reduced

Parents indicates the combination of the direct precursor nodes of Xi. The probability value can be found in the probability table of the corresponding condition.

Bayesian networks are more complex than naive Bayes networks, and constructing and training a good Bayesian network is even more difficult. However, Bayesian networks are used to simulate human cognitive thinking reasoning models. They use a set of conditional probability functions and Directed Acyclic graphs to model the causal reasoning relationships of uncertainty. Therefore, Bayesian Networks have higher practical value.

2.4 Bayesian network construction and learning

The construction and training of Bayesian Networks are divided into the following two steps:

1. Determine the topological relationship between random variables to form a Dag. This step usually needs to be done by experts in the field. To build a good topology, it usually requires continuous iteration and improvement.

2. Train Bayesian Networks. This step is to complete the construction of the conditional probability table. If the value of each random variable can be directly observed, as shown in the preceding example, the training in this step is intuitive, the method is similar to Naive Bayes classification. However, if Hidden variable nodes exist in Bayesian Networks, the training method is complicated, for example, gradient descent. As these contents are too obscure and involve more in-depth mathematical knowledge, I will not go into details here. If you are interested, you can refer to the relevant literature.

2.5 application and example of Bayesian Network

As an uncertainty causal reasoning model, Bayesian Networks are widely used and play an important role in medical diagnosis, information retrieval, electronic technology and industrial engineering, some of its related issues are also recent hot topics. For example, Google uses Bayesian networks in many services.

In terms of usage, Bayesian Networks are mainly used for probabilistic reasoning and decision-making. Specifically, when information is incomplete, it can infer unobserved random variables by observing random variables, in addition, the number of unobserved random variables can be greater than one. In general, inobserved variables are set to random values in the initial stage, and then probability inference is performed. The following is an example.

This is also an example of false account detection in the SNS community. Our model has four random variables: account authenticity R, Avatar authenticity H, log density L, and friend density f. H, L, F are the observed values, while R, which is the most relevant, cannot be directly observed. This problem is divided into probability inference for R through the observed values of H, L, F. The reasoning process can be expressed as follows:

1. instantiate H, L, and f using observed values to assign random values to R.

2. computing. The probability value can be used to query the conditional probability table.

Since the preceding example only contains an unknown random variable, no iteration is required. Generally, the process of reasoning using Bayesian Networks is described as follows:

1. instantiate all nodes with observed random variables with observed values, and instantiate nodes with random values.

2. traverse the Dag, calculate y for every inobserved NodeAmong them, wi indicates all nodes except y, A is the normalization factor, and SJ indicates the J subnode of Y.

3. instantiate each y calculated in step 3 as the new value of the unknown node, repeat Step 2 until the result fully converges.

4. Use the convergence result as the inferred value.

The above is only one of Bayesian network inference algorithms, and there are other algorithms, which are not described here.

This article is based on the signature-non-commercial use of the 3.0 License Agreement, you are welcome to reprint, deduction, but must keep the signature of this article Zhang Yang (including links), and cannot be used for commercial purposes. If you have any questions or negotiation with the Authority, please contact me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.