Spark is a cluster computing platform originating from the Amplab of the University of California, Berkeley, which is based on memory computing and has more performance than Hadoop, and is a rare all-around player, starting with multiple iterations, eclectic data warehousing, streaming, and graph computing paradigms. Spark uses a unified technology stack to solve the cloud computing large data stream processing, graph technology, machine learning, NoSQL query and other aspects of all the core issues, with a perfect ecosystem, which directly laid its unified cloud computing large data field hegemony.
Along with the popularization and popularization of spark technology, the demand for professional talents is increasing. Spark professionals in the future is also hot, easy to get millions of pay. and want to become spark master, also need a recruit one type, from the internal strength: Generally speaking, need to go through the following stages:
First stage: Proficiency in Scala language
The spark framework is written in a Scala language, refined and elegant. To become a spark master, you have to read Spark source code, you must master Scala;
While the spark can be developed using multi-language Java, Python, and so on, the fastest and most supported development APIs remain and will always be Scala's API, so you have to master Scala to write complex and high-performance spark distributed programs;
In particular, be proficient in Scala's trait, apply, functional programming, generics, inverter and covariance;
Phase II: Proficiency in the Spark platform itself is provided to the developer API
Master the development mode of RDD in Spark, master the use of various transformation and action functions;
To master the wide dependence and narrow dependence and lineage mechanism in spark;
Master the RDD computing process, such as Stage division, the basic process of spark application submission to the cluster and the working principle of the Worker Node Foundation
Phase III: Deep spark kernel
This phase is mainly through the spark framework of source study to drill down into the Spark kernel section:
Master Spark task submission process through source code;
Master the task scheduling of spark cluster by source code;
In particular, be proficient in the details of each step of the work within the Dagscheduler, TaskScheduler and worker nodes;
Class IV: Mastering the use of core frameworks based on spark
Spark, as the epitome of the cloud computing era, has significant advantages in real-time streaming, graph technology, machine learning, NoSQL query and so on, when we use Spark most of the time is in the use of its framework such as shark, Spark streaming, etc.:
Spark Streaming is a very excellent real-time flow processing framework to master its Dstream, transformation and checkpoint, etc.
Spark Off-line statistical analysis function, Spark 1.0.0 version in shark based on the introduction of Spark SQL, off-line statistical analysis of the function of the efficiency has significantly improved, need to focus on mastery;
To master the principle and usage of Spark machine learning and GRAPHX;
Class Five: A business-level spark project
Through a complete representative of the spark project to penetrate all aspects of the spark, including the project's architectural design, the use of technical analysis, development, operation and so on, complete grasp of each of these stages and details, so that you can calmly face the vast majority of spark projects.
Class VI: Offering SPARK solutions
Thoroughly grasp every detail of spark framework source code;
According to the needs of different business scenarios to provide spark in different scenarios under the solution;
According to the actual needs, two development of the spark framework is built, and the spark frame is created.
The first and second phases of the six stages that have been described as spark Masters can be progressively completed by self-study, the next three stages are best made by a master or expert guide to step by step, the last stage, basically is to "no recruit wins a recruit" period, a lot of things to be accomplished by heart.