International - English

Cart Console

Topic Center

Contact Sales

Home > Hot Categories > Big Data

Teach you how to be a master of spark big Data?

Last Update:2016-11-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Teach you how to be a master of spark big Data? Spark is now being used by more and more businesses, like Hadoop, where Spark is also submitting tasks to the cluster as a job, so how do you become a master of spark big Data? Here's an in-depth tutorial.

Spark is a cluster computing platform originating from the University of California, Berkeley, Amplab, which is a rare all-rounder based on memory computing, performance exceeding Hadoop, starting from multiple iterations of batch processing, eclectic data warehousing, streaming and graph computing. Spark uses a unified technology stack to solve all the core issues of cloud computing big data such as stream processing, graph technology, machine learning, NoSQL query and so on, and has a perfect ecosystem, which directly lays the dominant position in unified cloud computing big data field.

With the popularization of spark technology, the demand for professional talents is increasing. Spark professionals are also hot in the future and can easily get millions of pay. To be a spark master, you also need a recruit, from the internal skills: usually need to go through the following stages:

First stage: Mastering the Scala language skillfully

The spark framework, written in Scala, is refined and elegant. To be a spark master, you have to read the source code of Spark and you have to master Scala;

While the current spark can be used for application development in multi-lingual Java, Python, and so on, the fastest and best-supported development APIs remain and will always be Scala-style APIs, so you have to master Scala to write complex and high-performance spark distributed programs;

In particular, be proficient in Scala's trait, apply, functional programming, generics, contravariance and covariance;

Phase II: Proficiency in the Spark platform itself is provided to the developer API

Mastering the development pattern of the RDD in Spark, mastering the use of various transformation and action functions;

Mastering the wide dependency and narrow dependencies in spark and the lineage mechanism;

Master the calculation flow of the RDD, such as the division of the stage, the basic process that the spark application submits to the cluster, and how the Worker Node Foundation works

Phase three: Dive into the spark kernel

This phase is mainly through the Spark Framework's source reading to delve into the Spark kernel section:

Master Spark's task submission process through source code;

Master the task scheduling of Spark cluster through the source code;

In particular, be proficient in the details of each step of the work within the Dagscheduler, TaskScheduler, and worker nodes;

Class Four: Mastering the use of core frameworks based on spark

As a synthesizer in the era of cloud computing big data, Spark has a significant advantage in real-time streaming, graph technology, machine learning, NoSQL queries, and most of the time we use spark is using frameworks such as shark, spark streaming, and so on:

Spark streaming is a very good real-time streaming framework, mastering its dstream, transformation and checkpoint, etc.

Spark's off-line statistical analysis feature, Spark 1.0.0 version on the basis of shark launched spark SQL, offline statistical analysis of the efficiency of the function has significantly improved, need to focus on;

For spark machine learning and GRAPHX to master its principles and usage;

Class Five: Doing a business-class spark project

Complete every aspect of spark through a complete and representative spark project, including project architecture design, technical profiling, development implementation, operations, and more, all in one of these stages and details, so you can easily face the vast majority of spark projects in the future.

Class VI: Offering SPARK solutions

Thoroughly grasp every detail of the spark frame source code;

Provide spark solutions under different scenarios based on the needs of different business scenarios;

According to the actual needs, on the basis of the spark framework two development, build their own spark framework;

The first and second stages of the six stages of being a spark master can be completed by self-study, followed by three stages, preferably by a master or expert guidance of the next step, the last stage, basically is to "no strokes win The Recruit" period, a lot of things to be done with the heart.

Teach you how to be a master of spark big Data?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

Big Data Architecture Development Mining Analytics Hadoop HBa... 12-02

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

MYSQL Big Data Import 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Teach you how to be a master of spark big Data?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support