Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

Source: Internet
Author: User
Bayesian LawMachine learning task: determines the best assumption in space H when training data D is given. Best assumption: one method is to define it as the most likely hypothetical bayesian theory under the knowledge of prior probability of different assumptions in given data d and H, the anterior probability based on assumptions, the probability of observing different data under a given hypothesis, and the observed data itself Anterior probability and Posterior ProbabilityP (A) is used to represent the initial probability that a has before there is no training data. P (A) is called the prior probability of. The prior probability reflects the background knowledge about the opportunity where A is a correct assumption. Without this prior knowledge, you can simply assign each candidate hypothesis the same prior probability, P (B) indicates the prior probability of training data B. P (A | B) indicates the probability of a when B is set. In machine learning, we are concerned with P (B | A), that is, the probability that B is established when a is given, which is called the posterior probability of B. Bayesian FormulaBayesian formula provides a method to calculate the posterior probability P (B | A) from the prior probability P (A), P (B), and P (A | B ).

Bayesian theorem is based on the following Bayesian formula:


P (A | B) increases with the growth of P (A) and P (B | A), and decreases with the growth of P (B, that is, if B is more likely to be observed when it is independent of A, then B's support for a is smaller.

Naive Bayes

The naive Bayes algorithm uses Bayesian formulas to classify features that are independent of each other. See 70173402 

 

The official example code of spark naviebayes is as follows:

Import org. Apache. Spark. ml. Classification. naivebayes
Import org. Apache. Spark. ml. Evaluation. multiclassclassificationevaluator
Import org. Apache. Spark. SQL. sparksession

Object naviebayesdemo {
Def main (ARGs: array [String]): unit = {
Val spark = sparksession
. Builder
. Appname ("naviebayesdemo"). Master ("local ")
. Config ("spark. SQL. Warehouse. dir", "C: \ study \ sparktest ")
. Getorcreate ()
// Load the data stored in libsvm format as a dataframe.
Val dataset = spark. Read. Format ("libsvm"). Load ("Data/mllib/sample_libsvm_data.txt ")
// Split the data into training and Test Sets (30% held out for testing)
Val array (tranningdata, testdata) = dataset. randomsplit (Array (0.7, 0.3), seed = 1234l)

// Train a naviebayes Model
Val model = new naivebayes (). Fit (tranningdata)
// Select example rows to display.
Val predictions = model. Transform (testdata)
Predictions. Show ()

// Select (prediction, true label) and compute Test Error
Val evaluator = new multiclassclassificationevaluator ()
. Setlabelcol ("label ")
. Setpredictioncol ("prediction ")
. Setmetricname ("accuracy ")
Val accuracy = evaluator. Evaluate (predictions)
Println (S "Test Set accuracy = $ accuracy ")

Spark. Stop ()
}
}

The running result is as follows:

18/10/24 11:50:06 info sparkcontext: starting job: collectasmap at multiclassmetrics. scala: 48 + ----- + signature + ----------- + ---------- + | label | features | rawprediction | probability | prediction | + ----- + signature + ------------------ + ----------- + ---------- + | 0.0 | (692, [, 97, 12... | [-173678. 60946628... | [1.0, 0.0] | 0.0 | 0.0 | (692, [100, 99, 1... | [-178107. 24302988... | [1.0, 0.0] | 0.0 | 0.0 | (692, [100,101,102... | [-100020. 80519087... | [1.0, 0.0] | 0.0 | 0.0 | (692, [124,125,126... | [-183521. 85526462... | [1.0, 0.0] | 0.0 | 0.0 | (692, [127,128,129... | [-183004. 12461660... | [1.0, 0.0] | 0.0 | 0.0 | (692, [128,129,130... | [-246722. 96394714... | [1.0, 0.0] | 0.0 | 0.0 | (692, [152,153,154... | [-208696. 01108598... | [1.0, 0.0] | 0.0 | 0.0 | (692, [153,154,155... | [-261509. 59951302... | [1.0, 0.0] | 0.0 | 0.0 | (692, [154,155,156... | [-217654. 71748256... | [1.0, 0.0] | 0.0 | 0.0 | (692, [181,182,183... | [-155287. 07585335... | [1.0, 0.0] | 0.0 | 1.0 | (692, [99,100,101 ,... | [-145981. 83877498... | [0.0, 1.0] | 1.0 | 1.0 | (692, [100,101,102... | [-147685. 13694275... | [0.0, 1.0] | 1.0 | 1.0 | (692, [123,124,125... | [-139521. 98499849... | [0.0, 1.0] | 1.0 | 1.0 | (692, [124,125,126... | [-129375. 46702012... | [0.0, 1.0] | 1.0 | 1.0 | (692, [126,127,128... | [-145809. 08230799... | [0.0, 1.0] | 1.0 | 1.0 | (692, [127,128,129... | [-132670. 15737290... | [0.0, 1.0] | 1.0 | 1.0 | (692, [128,129,130... | [-100206. 72054749... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-129639. 09694930... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-143628. 65574273... | [0.0, 1.0] | 1.0 | 1.0 | (692, [129,130,131... | [-129238. 74023248... | [0.0, 1.0] | 1.0 | + ----- + -------------------- + rows + ----------- + ---------- + only showing top 20 rows18/10/24 11:50:06 info dagschedwing: job 6 finished: countbyvalue at multiclassmetrics. scala: 42, took 0.157446 stest set accuracy = 1.0

 

Bayesian, Naive Bayes, and call the spark official mllib naviebayes example

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.