Spark is a cluster computing platform originating from the University of California, Berkeley, amplab. It is based on memory computing and has hundreds of times better performance than hadoop. It starts from multi-iteration batch processing, it is a rare and versatile player that combines multiple computing paradigms, such as data warehouses, stream processing, and graph computing. Spark uses a unified technology stack to solve all core issues of cloud computing big data, such as stream processing, graph technology, machine learning, and nosql query. It has a complete ecosystem, this directly laid the dominant position in the field of unified cloud computing and big data.
With the popularization of spark technology, the demand for professionals is increasing. Spark professionals are also popular in the future, and they can easily get millions of yuan in salary. To become a spark master, you also need to start from internal skills: Generally, you need to go through the following stages:
Stage 1: proficient in Scala
1. The spark framework is written in Scala and elegant. To become a spark Master, you must read the spark source code and master Scala ,;
2. Although spark can be developed using multiple languages such as Java and python, the fastest and best-supported API development will always be Scala APIs, therefore, you must master Scala to write complex and high-performance spark distributed programs;
3. Be familiar with Scala trait, apply, functional programming, generic, inverter and covariant;
Stage 2: proficient in the spark platform and providing APIs to developers
1. Master the RDD-oriented development mode in spark and the use of various transformation and action functions;
2. Master the wide dependency, narrow dependency, and lineage mechanisms in spark;
3. Master the RDD computing process, such as division of stages, basic processes submitted by Spark applications to the cluster, and basic working principles of worker nodes.
Stage 3: go deep into spark Kernel
In this phase, the spark kernel is thoroughly studied through the source code of the spark framework:
1. Use the source code to master the spark job submission process;
2. Use the source code to master spark cluster task scheduling;
3. Be particularly proficient in the details of each step of the work within the dagscheduler, taskscheduler, and worker nodes;
Class 4: master the use of core spark-based frameworks
Spark, as an integrated person in the cloud computing Big Data era, has significant advantages in real-time stream processing, graph technology, machine learning, nosql query, and other aspects, most of the time we use spark, we use frameworks such as shark and spark streaming:
1. Spark streaming is an excellent real-time stream processing framework. You must master its dstream, transformation, and checkpoint;
2. Spark's offline statistical analysis function. Spark 1.0.0 launched spark SQL on the basis of shark. The offline statistical analysis function has significantly improved the efficiency and needs to be focused on;
3. Master the principles and usage of spark machine learning and graphx;
Class 5: spark projects at the commercial level
A complete and representative Spark project is used throughout all aspects of spark, including project architecture design, analysis of technologies used, development implementation, and O & M, complete understanding of each stage and details, so that you can easily face the vast majority of spark projects in the future.
Class 6: spark Solutions
1. Thoroughly understand every detail of the spark framework source code;
2. Provide spark solutions in different scenarios according to the needs of different business scenarios;
3. Based on actual needs, perform secondary development based on the spark framework to build your own spark framework;
The first and second stages of the six phases described above can be gradually completed through self-study. The next three phases should be completed step by step under the guidance of experts or experts, in the last phase, there is basically a period of "no action, no action, no action". A lot of things can be done only by understanding them carefully.