1.spark Core:spark RDD Core summary; Spark operator selection strategy; Spark core job scheduling and task scheduling; Spark parameter tuning; Spark Operational Architecture Core Summary; Spark Shuffle principle, shuffle operational problem solving and parameter tuning
2.spark SQL or SQL: This has not been the opportunity to go deep, just stay in the basic understanding stage;
3.spark ml or mllib: Learning is relatively loose, basic understanding, but with less time, only in the building of user portrait of the use of some common classification model
Plan in-depth: Spark implements Item-base cf,xgboost support for Spark; SPARK-KNN
4.python: "Machine learning Combat" code part, part Leetcode code; Contact Numpy,matplotlib,pandas,scikit-learn
5. Machine learning: "Statistical learning Method" "machine learning Combat"; Machine Learning Summary
Plan in-depth: "Statistical learning basic data Mining inference prediction", "data mining concept and technology"
2016 year-end summary