Apache Spark Mllib is one of the most important pieces of the Apache Spark System: A machine learning module. It's just that there are not very many articles on the web today. For Kmeans, some of the articles on the Web provide demo-like programs that are basically similar to those on the Apache Spark official web site: After getting the training model, almost none of them show how to use the model, program run process, results display, and example test data.
The author is based on Apache Spark official online program fragment. Wrote a complete call to the Mllib Kmeans Library's test program and successfully ran the Spark 1.0 + Yarn 2.2 environment. Because only for high-speed experience purposes. Much of the detail in this program has not been polished, but it is believed to offer a little entry-level help for friends interested in spark mllib.
[A. Main part of the program]
[B. Test data]
[C. Run]
Use ${spark_home}/bin/spark-submit to submit the program to yarn to run.
[D. Results]
-Results returned by the Console (last few lines):
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvc2ftagfja2vy/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">
-Results of the run returned by Yarn Web console:
-The output of the Scala program is displayed in Yarn Log:
[E. Summary]
-The process of invoking the Spark MLlib library is not complicated
-Use the Model (Kmeansmodel) trained by Mllib Kmeans to easily classify new data
3 minutes to learn to call Apache Spark MLlib Kmeans