Spark into a large data age of cloud computing

Source: Internet
Author: User
Keywords Cloud computing the synthesizer

Spark is a cluster computing platform originating from the Amplab of the University of California, Berkeley, which is based on memory computing and has more performance than Hadoop, and is a rare all-around player, starting with multiple iterations, eclectic data warehousing, streaming, and graph computing paradigms. Spark uses a unified technology stack to solve the cloud computing large data stream processing, graph technology, machine learning, NoSQL query and other aspects of all the core issues, with a perfect ecosystem, which directly laid its unified cloud computing large data field hegemony.

Along with the popularization and popularization of spark technology, the demand for professional talents is increasing. Spark professionals in the future is also hot, easy to get millions of pay. and want to become spark master, also need a recruit one type, from the internal strength: Generally speaking, need to go through the following stages:

First stage: Proficiency in Scala language

The spark framework is written in a Scala language, refined and elegant. To become a spark master, you have to read Spark source code, you must master Scala;

While the spark can be developed using multi-language Java, Python, and so on, the fastest and most supported development APIs remain and will always be Scala's API, so you have to master Scala to write complex and high-performance spark distributed programs;

In particular, be proficient in Scala's trait, apply, functional programming, generics, inverter and covariance;

Phase II: Proficiency in the Spark platform itself is provided to the developer API

Master the development mode of RDD in Spark, master the use of various transformation and action functions;

To master the wide dependence and narrow dependence and lineage mechanism in spark;

Master the RDD computing process, such as Stage division, the basic process of spark application submission to the cluster and the working principle of the Worker Node Foundation

Phase III: Deep spark kernel

This phase is mainly through the spark framework of source study to drill down into the Spark kernel section:

Master Spark task submission process through source code;

Master the task scheduling of spark cluster by source code;

In particular, be proficient in the details of each step of the work within the Dagscheduler, TaskScheduler and worker nodes;

Class IV: Mastering the use of core frameworks based on spark

Spark, as the epitome of the cloud computing era, has significant advantages in real-time streaming, graph technology, machine learning, NoSQL query and so on, when we use Spark most of the time is in the use of its framework such as shark, Spark streaming, etc.:

Spark Streaming is a very excellent real-time flow processing framework to master its Dstream, transformation and checkpoint, etc.

Spark Off-line statistical analysis function, Spark 1.0.0 version in shark based on the introduction of Spark SQL, off-line statistical analysis of the function of the efficiency has significantly improved, need to focus on mastery;

To master the principle and usage of Spark machine learning and GRAPHX;

Class Five: A business-level spark project

Through a complete representative of the spark project to penetrate all aspects of the spark, including the project's architectural design, the use of technical analysis, development, operation and so on, complete grasp of each of these stages and details, so that you can calmly face the vast majority of spark projects.

Class VI: Offering SPARK solutions

Thoroughly grasp every detail of spark framework source code;

According to the needs of different business scenarios to provide spark in different scenarios under the solution;

According to the actual needs, two development of the spark framework is built, and the spark frame is created.

The first and second phases of the six stages that have been described as spark Masters can be progressively completed by self-study, the next three stages are best made by a master or expert guide to step by step, the last stage, basically is to "no recruit wins a recruit" period, a lot of things to be accomplished by heart.

For the training of spark personnel, the current domestic professional institutions, mostly in the Android and Hadoop direction to provide relevant courses and training. Spark Asia-Pacific Institute, as the first domestic spark technology research and promotion agencies, in helping enterprises planning, deployment, development, training and use of spark as the core, while providing spark source research and application technology training. After completing a thorough study of spark source code and constantly using Spark's various features in the real world, Spark Asia-Pacific Institute launched its first spark training system: "Master Spark within 18 hours", "Spark enterprise development Best Practices", " Proficient in Spark:spark core analysis, source code interpretation, performance optimization and business case combat, "Spark 1.0.0 Enterprise Development Hands-on", "Spark Architecture Case Appreciation", "proficient in Spark development language: Scala Best Practice", to help learners, Through the above several stages of training, and gradually proficient in spark technology.

(Responsible editor: Mengyishan)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.