Conquer Big Data through Spark

Source: Internet
Author: User
Tags apache mesos hadoop mapreduce sparkr databricks

Course Background:

Apache Spark™is a fast and general engine for large-scale data processing. Spark has a advanced DAG execution engine, that supports cyclic data flow and in-memory computing. You can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.:

Spark powers a stack of high-level tools including Spark sql,mllib for machine learning, GraphX, and spark streaming. You can combine these libraries seamlessly in the same application:

You can run the Spark readily using its standalone cluster mode, on EC2, or run it on Hadoop YARN or Apache Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source:

Write applications quickly in Java, Scala or Python.spark offers over high-level operators Arallel apps. And you can use it interactively from the Scala and Python shells.

apache spark has seen phenomenal adoption, being widely  slated as the successor to Hadoop MapReduce, and being  Deployed in clusters from a handful to thousands of nodes.

in the past few years ,databricks, with the help  of the spark community, has contributed many improvements to  Apache spark to improve its performance, stability, and scalability.  this enabled databricks to use apache spark to sort 100  tb of data on 206 machines in 23 minutes, which is  3x faster than the previous hadoop 100tb result on 2100  machines. similarly, databricks sorted 1 pb of data on 190  machines in less than 4 hours, which is over 4x faster  than the previous hadoop 1pb result on 3800 machines.

Spark is fulfilling it promise to serve as a faster and more scalable engine for data processing of all sizes. Spark enables equally dramatic improvements in time and cost for all Big Data users.

Course Introduction:

this course almost covers everything for application  developer to build diverse spark applications to fulfill all  Kinds of business requirements: architecture of spark , the programming model in spark internals of spark , Spark sql, MLlib, GraphX, spark streaming, testing , tuning , spark on yarn

Additional,Thiscourse also covers the very necessary skills your need to write Scala code in Spark, to H ELP whom is isn't familiar with Scala.

Who Needs to attend

Anyone who's interested in Big Data development;

Hadoop Developer;

Other Big Data Developer;

Liaoliang Teacher (Contact email [email protected] Tel: 18610086859 qq:1740415547 No.: 18610086859)

Prerequisites

Is familiar with the basics of object-oriented programming;

Course Outline

Day 1

Class 1:the Architecture of Spark

1 Ecosystem of Spark

2 Design of Spark

3 RDD

4 fault-tolerance in Spark

Class 2Programming with Scala

1 Classes and Objects in Scala

2 funtional Object

3 Traits

4 Case class and Pattern Matching

5 Collections

6 Implicit Conversions and Parameters

7 Actors and Concurrency

Class 3:spark Programming Model

1 RDD

2 transformation

3 action

4 lineage

5 Dependency

Class 4:spark Internals

1 Spark Cluster

2 Job Scheduling

3 Dagscheduler

4 TaskScheduler

5 Task Internal

Time

CONTENT

Note

Day 2

Class 5:broadcasts and accumulators

1 Broadcast Internal

2 Best Practice in Broadcast

3 accumulators Internal

4 Best Practice in accumulators

Class 6:action in programming Spark

1 Data Source:File,HDFS,HBase,S3;

2 idea

3 Maven

4 SBT.

5 Code

6 Deployment

Class 7:deep in Spark Driver

1 The Secret of Sparkcontext

2 The Secret of sparkconf

4 The Secret of sparkenv

Class 8:deep in RDD

1 DAG

2 Scala RDD Function

3 Spark Java RDD Function

4 RDD Tuning

Time

CONTENT

NOTE

Day 3

Class 9:machine Learning on Spark

1 linearregression

2 K-means

3 Collaborative Filtering

Class 10:graph computation on Spark

1 Table Operators

2 Graph Operators

3 GraphX Algorithms

Class 11:spark SQL

1 parquet,JSON,JDBC

2 DSL

3 SQL on RDD

Class:Spark Streaming

1 DStream

2 transformation

3 checkpoint

4 Tuning

Time

CONTENT

NOTE

Day 4

Class 13:spark on Yarn

1 internals of Spark on Yarn

2 Best practice of Spark on Yarn

Class 14:jobserver

1 Restful Architecture of JobServer

2 JobServer APIs

3 Best Practice of JobServer

Class 15:sparkr

1 Programming in R

2 R on Spark

3 Internals of Sparkr

4 Sparkr API

Class 16:spark tuing

1 Logs

2 concurency

3 Memory

4 GC

5 serializers

6 Safety

7 14s cases of Tuning

Conquer Big Data through Spark

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.