Comprehensive in-depth analysis of spark2--knowledge points, source code, Tuning, JVM, graph calculation, project

Last Update:2017-10-13 Source: Internet

Author: User

Tags class operator shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Comprehensive in-depth analysis of spark2--knowledge points, source code, Tuning, JVM, graph calculation, project
Course View Address: http://www.xuetuwuyou.com/course/220
The course out of self-study, worry-free network: http://www.xuetuwuyou.com

A total of 14 chapters, 316 sections, the course from spark-related technical aspects of all-round analysis, and finally the actual project: User interactive behavior Analysis System, DMP User Portrait system, to spark to do a comprehensive application to explain, can say a set in hand, play all the invincible hand!

1th Chapter: Scala
Task 1:java and Scala comparison
Task 2: Why learn Scala
Task 3:scala Compiler installation
Task 4: First Scala program written
Task 5:scala Tool Installation
Task 6: Programming with Idea
Task 7:idea hit Jar pack
Task 8: Declaration of a variable
Task 9:scala Data type
Task 10:if Expression
Task 11: Code block
Task 12: Cyclic-while
Task 13: Cyclic-for
Task 14:scala operator
Task 15: Definition of a method
Task 16: Defining functions
Task 17: Decorative Design
Task 18:java to explain functional programming
Task 19: Knowledge Review
Task 20: Fixed-length arrays and edge-length arrays
Task 21: Conversion and traversal of arrays
Task 22: An array of commonly used algorithms
Task 23:map Collection
Task 24: Tuple operations
Task 25:list Collection Operations
Task 26:scala Implement Word Count
Task 27:set Collection Operations
Task 28:lazy Features
Task 29:scala Course Description
Task 30: Definition of class
Task 31: View the fixed class file
Task 32: Primary and secondary constructors
Task 33: Morning Knowledge Review
Task 34: Objects
Task 35:apply method
Task 36:trait (trait)
Task 37: Extend the application
Task 38: Inheritance
Task 39: Abstract class
Task 40: Pattern Matching
Task 41:scala String Printing
Task 42: Sample Class
Task 43:option (Some,none)
Task 44: Partial function
Task 45: Closures
Task 46:curring
Task 47: Hermit parameters
Task 48: Hermit Conversion
Task 49: The Hermit Conversion Opportunity 2 case Demo
Task 50: Hermit conversion Case 1
Task 51: Hermit conversion Case 2
Task 52: Upper and lower bounds
Task 53: Upper Bound
Task 54: Nether Case
Task 55: View Boundaries
Task 56: Co-change
Task 57: Contravariance
Task 58: Knowledge Summary
Task 59:socket Job
Task 60: Job Requirements analysis
Task 61: Job Code implementation
Task 62: About actor Knowledge description
Basic concept explanation of Task 63:actor
Task 64:actor Case Demo
Task 65: Case two requirements analysis
Task 66: Case Code Demo (top)
Task 67: Case Code demo (bottom)

The 2nd Chapter: Sparkcore
Task 68: How to learn about open source technology
Task 69: What is spark
Four characteristics of Task 70:spark
Task 71:4spark fast Use (top)
Task 72:spark fast Use (bottom)
Task 73: What is an RDD
Task 74: Demonstrate what an RDD is
Task 75:spark The running process of the task
Task 76:9hadoop Cluster Construction
Task 77:spark Cluster Construction
Task 78:sparkha Cluster Construction
Task 79:scala developing Spark program Demo
Task 80:java7 developing SPARK programs
Task 81:java8 developing SPARK programs
Task 82:idea How to play Maven package
Task 83: Submit a task to the spark cluster
How task 84:rdd are created
Task 85: Description of the spark script
Task 86:transformation and Action principles
Task 87: Broadcast variables
Task 88: Accumulating variables
Task 89: Share variables using demo
Task 90: Persist
Task 91:checkpoint
Task 92: Additional Notes on persistence
Task 93:standalone Run mode
Task 94:spark-on-yarn
Task 95:spark-on-yarn Principle Description
Task 96:historyserver Service Configuration
Task 97:map-flatmap-filter
Task 98:sortbykey-reducebykey
Task 99:join-union-cogroup
Task 100:intersection-distinct-cartes
Task 101:mappartitions-repartition-coal
Task 102:coalesce and repartition Difference supplement
Task 103:aggregatebykey-mappartitionswi
Task 104: Description of the action operator
Task 105: Description of the Collect operator
Task 106:spark two-time ordering
Task 107: Narrow dependencies and wide dependencies
Task 108: Examples of narrow dependencies and wide dependencies
Task 109: noun Interpretation
Task 110:stage Partitioning algorithm
Scheduling of Task 111:spark tasks

3rd: Spark Tuning
Task 112: Avoid creating a duplicate Rdd
Task 113: Reuse the same rdd as much as possible
Task 114: Persist a multiple-use RDD
Task 115: Try to avoid using the shuffle class operator
Task 116: Shuffle operations with map-side pre-aggregation
Task 117: Using high-performance operators
Task 118: Broadcast Large variables
Task 119: Optimize serialization performance with Kryo
Task 120: Optimizing Data structures
Task 121: Data localization
Task 122: The principle of data skew and how to position data skew
Task 123: Preprocess Data using Hive ETL
Task 124: Filter A handful of keys that cause tilt
Task 125: Increase the degree of parallelism of shuffle operations
Task 126: Two-phase aggregation (local aggregation + global aggregation)
Task 127: Convert reduce join to map join
Task 128: Sample tilt key and split join operation
Task 129: Join with the random prefix and the expansion rdd
Task 130: Comprehensive Application of solutions
Task 131: Various shuffle versions
Task 132:shuffle Tuning
Task 133:spark Resource Tuning
Task 134:spark 1.5 version memory model
Memory model for Task 135:spark two
Task 136:whole-stagecodegeneration

4th Chapter: JVM Tuning
Schema of the task 137:JVM
Task 138: How the three regions work together
Task 139: Heap Structure
Task 140:JDK eight memory model
Task 141: Heap Memory Overflow Case Demo
Task 142:ma Tool Brief Introduction
Task 143:GC Log Format description
Task 144: Heap Memory Configuration Demo
Task 145: Stack parameter configuration
Task 146: Garbage Collection Algorithm Introduction
Task 147:stop-the-world
Task 148: Garbage Collection algorithm
Task 149: Introduction to the garbage collector
Task 150: Common collector configuration Demo
Task 151:cms garbage collector
Task 152:HADOOPJVM Tuning Demo
Task 153: Introduction to the garbage collector
Task 154: Introduction to Performance monitoring tools
Task 155: Large objects go straight into the old age

The 5th chapter: Sparkcore Source Code Analysis
Task 156: How to find the source code
Task 157: How to associate the source code
Task 158:master START process
Task 159:master and worker start-up process
Task 160:sparak-submit Submission Process
Task 161:sparkcontext Initialization
Task 162: Create a TaskScheduler
Task 163:DAGSCHEDUELR Initialization
Task 164:taskschedulerimp Start
Task 165:master Resource Scheduling algorithm
Task 166:TASKSCHEDULERIMLUML diagram
Task 167:executor Registration
Start UML diagram for Task 168:executor
Task 169:spark Task Submission
Task 170:task Task Run
Task 171:spark Task Submission Detail process
Task 172:spark Task submission Process drawing summary
Mission 173:blockmanager in-depth analysis
Mission 174:cachemanager in-depth analysis

The 6th chapter: Sparksql
Task 175: Description of the default number of partitions
Mission 176:sparkcore Official Case Demo
Mission 177:spark's Past life
Release Notes for Task 178:spark
Task 179: What is Dataframe
Task 180:dataframe First Experience
Task 181:rdd turn Dataframe mode one
Task 182:rdd converted to Dataframe Mode II
Task 183:rdd VS DataFrame
-load of Task 184:sparksql data source
-save of Task 185:sparksql data source
JSON and parquet of task 186:sparksql data sources
JDBC of task 187:sparksql data source
Task 188:spark hive for the data source
Task 189:thriftserver
Task 190:sparksql Case Demo
Task 191:sparksql and Hive integration
UDF of Task 192:sparksql
UDAF of Mission 193:sparksql
window function of Task 194:sparksql
Task 195:goupby and Agg
Task 196: Knowledge Summary

The 7th chapter: Kafka
Task 197: Why Kafka appear
The core concept of task 198:kafka
Task 199:kafka core concept again carding
Task 200: Introduction to various languages
Task 201: The benefits of the messaging system
Task 202: Classification of the message system and (Pull,push) differences
Task 203:kafka The architecture of the cluster
Construction of Task 204:kafka cluster
Task 205: Cluster Test Demo
Ha for Task 206:kafka data
The design of Task 207:kafka
Task 208:kafak Code Test
Task 209: Jobs
Offset of Task 210:kafka

The 8th chapter: sparkstreaming
Task 211: Brief talk about the future of sparkstreaming
Running process of Task 212:sparkstreaming
Task 213:dstream Drawing detailed
Task 214: Flow of streaming calculations
Task 215:socketstreaming Case Demo
Task 216:hdfsdstream Case Demo
Task 217:updatestatebykey Case Demo
Task 218:transform blacklist filter Demo
Task 219:window Action Case Demo
Task 220:transform blacklist filter Demo Supplement
Task 221:foreachrdd Case Demo
Task 222:kafka-sparkstreaming Integration Demo
Task 223:kafka Multi-threaded consumption data
Task 224:kafka consuming data in parallel using the thread pool

The 9th chapter: Streaming Tuning
Fault tolerance of task 225:sparkstreaming
Task 226:sparkstreaming VS Storm
Task 227:sparkstremiang and Kafka Integration (manual control offset
The degree of parallelism of task 228:sparkstreaming tuning
Task 229:sparkstreaming Tuning the memory
The serialization of Task 230:sparkstreaming tuning
The JVM&AMP;GC of Task 231:sparkstreaming tuning
Task 232:sparkstreaming Tuning individual task runs slowly
Resource instability of Task 233:sparkstreaming tuning
Task 234:sparkstreaming Data volume explosion

The 10th chapter: Streaming Source code
Mission 235:1sparkstreaming Source Guide Introduction
Task 236:sparkstreaming Operating principle
The principle of task 237:sparkstreaming communication model
Initialization of Task 238:stremaingcontext
Task 239:receiver START Process Guide
Task 240:receiver START Process UML Summary
Principle analysis of Task 241:block generation
Analysis of task 242:block generation and storage principle
Task 243: Responsibility chain Mode
Task 244:blockrdd Build and Job task submission
Task 245:blockrdd Build and Job Task submission Summary

The 11th chapter: SPARKGRAPHX
Task 246: Graph Calculation Introduction
Task 247: Figure Calculation Case Demo
Task 248: Basic Composition of graphs
Task 249: Figure Storage
Task 250: Find a friend case demo

The 12th chapter: SPARK2VSSPARK1
Task 251:spark new features
Task 252:rdd&dataframe&dataset
Task 253:rdd&dataframe&dataset
Task 254:sparksession Access Hive Supplemental Instructions
Task 255:dataframe and Datasetapi merging

The 13th Chapter: Integrated Project: User interactive behavior Analysis system
Task 256: Introduction to the project process
Task 257: Overall overview of the project
Task 258: Data sources for big data projects
Task 259: Project Background
Task 260: Common Concepts
Task 261: Project Requirements
Task 262: Project Consolidation process
Task 263: Thinking raised from the design of the table
Task 264: Get task parameters
Task 265: Requirements for a data message
Task 266: Requirements filter sessions based on criteria
Task 267: An illustrative example of demand
Task 268: Demand One-click order Payment category TOPN (top)
Task 269: Demand One-click order Payment category TOPN (bottom)
Task 270: Demand analysis of two requirements
Task 271: Requirements two data information
Task 272: Demand Two get user behavior data
Task 273: Requirement two user table and information table join
Task 274: Demand analysis of second demand
Task 275: Requirements two custom UDF functions
Task 276: Requirements Two-custom UDAF function
Task 277: Demand statistics on the number of product clicks in each region
Task 278: Demand two city information table and commodity information table join
Task 279: Demand two regions hot commodity statistics
Task 280: Requirements Two results Persistence Guide database
Task 281: Two Summary of requirements
Task 282: Demand analysis of three requirements
Task 283: Requirements three data information
Task 284: Three ways to comb the demand
Task 285: Requirements Three get data from Kafka
Task 286: Three requirements for blacklist filtering of data
Task 287: Demand three dynamic blacklist (top)
Task 288: Demand three dynamic blacklist (next)
Task 289: Demand three real-time statistics every city in the provinces advertising click
Task 290: Demand three real-time statistics of the provinces traffic click
Task 291: Demand three real-time statistics ads click Trend
Task 292: Three Summary of requirements

The 14th chapter: DMP User Portrait system
Task 293: Project Background
Task 294:DSP Process
Task 295: Project Process description
Task 296:utils Tool class development
Task 297: Demand-function development
Task 298: Package commits the code to the cluster to run
Task 299: Requirements II Description
Task 300: Reporting Requirements description
Task 301: Statistical data distribution in various provinces and cities
Task 302: Define a Word table statistic function
Task 303: Provincial City Report statistics
Task 304:app Report Statistics
Task 305: User Portrait requirements
Task 306: Tag
Task 307: Merging Context labels
Task 308: Context label test run
Task 309: Why do we need figure calculation
Task 310: Basic Concepts of graphs
Task 311: Simple Case Demo
Task 312: Thinking of merging context tags
Task 313: Simple Case Demo Description
Task 314: Continue to comb your ideas
Task 315: Generate a user relationship table
Task 316: Merge labels

Comprehensive in-depth analysis of spark2--knowledge points, source code, Tuning, JVM, graph calculation, project

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More