Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

Last Update:2018-10-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Bayesian LawMachine learning task: determines the best assumption in space H when training data D is given. Best assumption: one method is to define it as the most likely hypothetical bayesian theory under the knowledge of prior probability of different assumptions in given data d and H, the anterior probability based on assumptions, the probability of observing different data under a given hypothesis, and the observed data itself Anterior probability and Posterior ProbabilityP (A) is used to represent the initial probability that a has before there is no training data. P (A) is called the prior probability of. The prior probability reflects the background knowledge about the opportunity where A is a correct assumption. Without this prior knowledge, you can simply assign each candidate hypothesis the same prior probability, P (B) indicates the prior probability of training data B. P (A | B) indicates the probability of a when B is set. In machine learning, we are concerned with P (B | A), that is, the probability that B is established when a is given, which is called the posterior probability of B. Bayesian FormulaBayesian formula provides a method to calculate the posterior probability P (B | A) from the prior probability P (A), P (B), and P (A | B ).

Bayesian theorem is based on the following Bayesian formula:

P (A | B) increases with the growth of P (A) and P (B | A), and decreases with the growth of P (B, that is, if B is more likely to be observed when it is independent of A, then B's support for a is smaller.

Naive Bayes

The naive Bayes algorithm uses Bayesian formulas to classify features that are independent of each other. See 70173402

The official example code of spark naviebayes is as follows:

Import org. Apache. Spark. ml. Classification. naivebayes
Import org. Apache. Spark. ml. Evaluation. multiclassclassificationevaluator
Import org. Apache. Spark. SQL. sparksession

Object naviebayesdemo {
Def main (ARGs: array [String]): unit = {
Val spark = sparksession
. Builder
. Appname ("naviebayesdemo"). Master ("local ")
. Config ("spark. SQL. Warehouse. dir", "C: \ study \ sparktest ")
. Getorcreate ()
// Load the data stored in libsvm format as a dataframe.
Val dataset = spark. Read. Format ("libsvm"). Load ("Data/mllib/sample_libsvm_data.txt ")
// Split the data into training and Test Sets (30% held out for testing)
Val array (tranningdata, testdata) = dataset. randomsplit (Array (0.7, 0.3), seed = 1234l)

// Train a naviebayes Model
Val model = new naivebayes (). Fit (tranningdata)
// Select example rows to display.
Val predictions = model. Transform (testdata)
Predictions. Show ()

// Select (prediction, true label) and compute Test Error
Val evaluator = new multiclassclassificationevaluator ()
. Setlabelcol ("label ")
. Setpredictioncol ("prediction ")
. Setmetricname ("accuracy ")
Val accuracy = evaluator. Evaluate (predictions)
Println (S "Test Set accuracy = $ accuracy ")
    
Spark. Stop ()
}
}

The running result is as follows:

18/10/24 11:50:06 info sparkcontext: starting job: collectasmap at multiclassmetrics. scala: 48 + ----- + signature + ----------- + ---------- + | label | features | rawprediction | probability | prediction | + ----- + signature + ------------------ + ----------- + ---------- + | 0.0 | (692, [, 97, 12... | [-173678. 60946628... | [1.0, 0.0] | 0.0 | 0.0 | (692, [100, 99, 1... | [-178107. 24302988... | [1.0, 0.0] | 0.0 | 0.0 | (692, [100,101,102... | [-100020. 80519087... | [1.0, 0.0] | 0.0 | 0.0 | (692, [124,125,126... | [-183521. 85526462... | [1.0, 0.0] | 0.0 | 0.0 | (692, [127,128,129... | [-183004. 12461660... | [1.0, 0.0] | 0.0 | 0.0 | (692, [128,129,130... | [-246722. 96394714... | [1.0, 0.0] | 0.0 | 0.0 | (692, [152,153,154... | [-208696. 01108598... | [1.0, 0.0] | 0.0 | 0.0 | (692, [153,154,155... | [-261509. 59951302... | [1.0, 0.0] | 0.0 | 0.0 | (692, [154,155,156... | [-217654. 71748256... | [1.0, 0.0] | 0.0 | 0.0 | (692, [181,182,183... | [-155287. 07585335... | [1.0, 0.0] | 0.0 | 1.0 | (692, [99,100,101 ,... | [-145981. 83877498... | [0.0, 1.0] | 1.0 | 1.0 | (692, [100,101,102... | [-147685. 13694275... | [0.0, 1.0] | 1.0 | 1.0 | (692, [123,124,125... | [-139521. 98499849... | [0.0, 1.0] | 1.0 | 1.0 | (692, [124,125,126... | [-129375. 46702012... | [0.0, 1.0] | 1.0 | 1.0 | (692, [126,127,128... | [-145809. 08230799... | [0.0, 1.0] | 1.0 | 1.0 | (692, [127,128,129... | [-132670. 15737290... | [0.0, 1.0] | 1.0 | 1.0 | (692, [128,129,130... | [-100206. 72054749... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-129639. 09694930... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-143628. 65574273... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-129238. 74023248... | [0.0, 1.0] | 1.0 | + ----- + -------------------- + rows + ----------- + ---------- + only showing top 20 rows18/10/24 11:50:06 info dagschedwing: job 6 finished: countbyvalue at multiclassmetrics. scala: 42, took 0.157446 stest set accuracy = 1.0

Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support