"Winning the cloud computing Big Data era"
Spark Asia Pacific Research Institute Stage 1 Public Welfare lecture hall [Stage 1 interactive Q & A sharing]
Q1: Can spark shuffle point spark_local_dirs to a solid state drive to speed up execution.
You can point spark_local_dirs to a solid state drive, which can greatly improve the spark execution speed;
At the same time, if you want to increase the spark Running Speed faster, you can specify multiple shuffle output directories to allow shuffle to read and write disks in parallel;
Q2: Solidation = true: only merge on the same machine, right?
Solidation = true is to merge on the same machine;
When merging, the bucket belonging to the same CER is put into the same file, which greatly reduces the number of shuffler files and improves performance;
Q3: Will spark and hadoop coexist in the future?
Spark and hadoop will coexist, spark + hadoop = a winning combination;
In the coexistence, hadoop mainly uses HDFS for data storage, and spark is responsible for integrated and diversified big data computing;
This article is from the spark Asia Pacific Research Institute blog, please be sure to keep this source http://rockyspark.blog.51cto.com/2229525/1565214
[Interactive Q & A sharing] Stage 1 wins the public welfare lecture hall of spark Asia Pacific Research Institute in the cloud computing Big Data age