How to spark a master for cloud computing big data?

Source: Internet
Author: User
Keywords Cloud

Spark is a cluster computing platform originating from the Amplab of the University of California, Berkeley, which is based on memory computing and has more performance than Hadoop, and is a rare all-around player, starting with multiple iterations, eclectic data warehousing, streaming, and graph computing paradigms. Spark uses a unified technology stack to solve the cloud computing large data stream processing, graph technology, machine learning, NoSQL query and other aspects of all the core issues, with a perfect ecosystem, which directly laid its unified cloud computing large data field hegemony.

Along with the popularization and popularization of spark technology, the demand for professional talents is increasing. Spark professionals in the future is also hot, easy to get millions of pay. and want to become spark master, also need a recruit one type, from the internal strength: Generally speaking, need to go through the following stages:

First stage: Proficiency in Scala language

The spark framework is written in a Scala language, refined and elegant. To become a spark master, you have to read Spark source code, you must master Scala;

While the spark can be developed using multi-language Java, Python, and so on, the fastest and most supported development APIs remain and will always be Scala's API, so you have to master Scala to write complex and high-performance spark distributed programs;

In particular, be proficient in Scala's trait, apply, functional programming, generics, inverter and covariance;

Phase II: Proficiency in the Spark platform itself is provided to the developer API

Master the development mode of RDD in Spark, master the use of various transformation and action functions;

To master the wide dependence and narrow dependence and lineage mechanism in spark;

Master the RDD computing process, such as Stage division, the basic process of spark application submission to the cluster and the working principle of the Worker Node Foundation

Phase III: Deep spark kernel

This phase is mainly through the spark framework of source study to drill down into the Spark kernel section:

Master Spark task submission process through source code;

Master the task scheduling of spark cluster by source code;

In particular, be proficient in the details of each step of the work within the Dagscheduler, TaskScheduler and worker nodes;

Class IV: Mastering the use of core frameworks based on spark

Spark, as the epitome of the cloud computing era, has significant advantages in real-time streaming, graph technology, machine learning, NoSQL query and so on, when we use Spark most of the time is in the use of its framework such as shark, Spark streaming, etc.:

Spark Streaming is a very excellent real-time flow processing framework to master its Dstream, transformation and checkpoint, etc.

Spark Off-line statistical analysis function, Spark 1.0.0 version in shark based on the introduction of Spark SQL, off-line statistical analysis of the function of the efficiency has significantly improved, need to focus on mastery;

To master the principle and usage of Spark machine learning and GRAPHX;

Class Five: A business-level spark project

Through a complete representative of the spark project to penetrate all aspects of the spark, including the project's architectural design, the use of technical analysis, development, operation and so on, complete grasp of each of these stages and details, so that you can calmly face the vast majority of spark projects.

Class VI: Offering SPARK solutions

Thoroughly grasp every detail of spark framework source code;

According to the needs of different business scenarios to provide spark in different scenarios under the solution;

According to the actual needs, two development of the spark framework is built, and the spark frame is created.

The first and second phases of the six stages that have been described as spark Masters can be progressively completed by self-study, the next three stages are best made by a master or expert guide to step by step, the last stage, basically is to "no recruit wins a recruit" period, a lot of things to be accomplished by heart.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.