3 minutes to learn to call Apache Spark MLlib Kmeans

Source: Internet
Author: User
Tags spark mllib

Apache Spark Mllib is one of the most important pieces of the Apache Spark System: A machine learning module. It's just that there are not very many articles on the web today. For Kmeans, some of the articles on the Web provide demo-like programs that are basically similar to those on the Apache Spark official web site: After getting the training model, almost none of them show how to use the model, program run process, results display, and example test data.

The author is based on Apache Spark official online program fragment. Wrote a complete call to the Mllib Kmeans Library's test program and successfully ran the Spark 1.0 + Yarn 2.2 environment. Because only for high-speed experience purposes. Much of the detail in this program has not been polished, but it is believed to offer a little entry-level help for friends interested in spark mllib.


[A. Main part of the program]



[B. Test data]


[C. Run]

Use ${spark_home}/bin/spark-submit to submit the program to yarn to run.

[D. Results]

-Results returned by the Console (last few lines):

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvc2ftagfja2vy/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">

-Results of the run returned by Yarn Web console:


-The output of the Scala program is displayed in Yarn Log:


[E. Summary]

-The process of invoking the Spark MLlib library is not complicated

-Use the Model (Kmeansmodel) trained by Mllib Kmeans to easily classify new data



3 minutes to learn to call Apache Spark MLlib Kmeans

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.