Installation deployment for Spark languages

Source: Internet
Author: User

Spark is a class mapred computing framework developed by UC Berkeley Amplab. The Mapred framework applies to batch jobs, but because of its own framework constraints, first, pull-based heartbeat job scheduling. Second, the shuffle intermediate results all landed disk, resulting in high latency, start-up overhead is very large. And the spark is for iterative, interactive computing generation. First, it uses the Actor model Akka as the communication framework. Second, it uses the RDD distributed memory, the data between the operations do not need to dump to disk, but through the RDD partition distributed in each node memory, greatly improve the flow between the data, at the same time Rdd maintain the lineage relationship, once the Rdd fail off, can automatically reconstruct through the parent RDD, guaranteed the fault tolerance, but on Spark has the rich application, for instance Shark,spark streaming,mlbase. We have used shark in the production environment as a supplement to hive, which shares the Hive Metastore,serde, and is used in much the same way as hive, if the data input size is not very large, the same statement is actually much faster than the hive. Follow-up will write a separate article to be detailed.

Spark Software Stack

This article describes the installation of the following spark:

Spark can be run on the unified Resource scheduler, such as yarn, Mesos, and can also independently deploy the standalone mode, because we yarn the cluster has not yet, so temporarily with the standalone mode, it is the Master/slave mode, by a spark Master and a set of spark worker components, standalone mode only supports FIFO scheduling policies, and by default submitting a job takes away all of the core of the spark cluster, so that a cluster can run only one job. Need to set Spark.cores.max value to make adjustments

Deployment environment:

Spark Master:test85.hadoop

Spark Worker:test88.hadoop, Test89.hadoop, Test90.hadoop, Test91.hadoop

1. Ensure master and worker nodes ssh through

2. Because spark will use Hadoop client to interact with HDFS, each node needs to install the Hadoop client

3. Install Scala,scala 2.10.2 version and Spark Conflict, so only Scala can be installed 2.9.3

wget http://spark-project.org/download/spark-0.7.3-prebuilt-hadoop1.tgz  
Tar xzvf Spark-0.7.3-prebuilt-hadoop1.tgz  
ln-s spark-0.7.3 spark-release

Add environment variables to/etc/profile

Export Spark_home=/usr/local/spark-release export  
scala_home=/usr/local/scala  
export path= $PATH: $SPARK _ Home/bin: $SCALA _home/bin

Set the SPARK configuration file, in $spark_home/conf/spark-env.sh

Set up SPARK configuration file in $spark_home/conf/spark-env.sh  
export java_home=/usr/local/jdk  
export Scala_ Home=/usr/local/scala  
export spark_examples_jar= $SPARK _home/examples/target/scala-2.9.3/spark-examples_ 2.9.3-0.7.3.jar  
export spark_ssh_opts= "-p58422-o stricthostkeychecking=no"  
export spark_master_ip= Test85.hadoop  
      
export spark_master_webui_port=8088  
export spark_worker_webui_port=8099 export  
      
Spark_ worker_cores=4  
export spark_worker_memory=8g  
      
export Ld_library_path=/usr/local/hadoop/lzo/lib  
Export spark_library_path=/usr/local/hadoop/hadoop-release/lib/native/linux-amd64-64

Spark_worker_cores is set to physical WORKER CPU cores, spark_worker_memory is the total amount of physical memory available to SPARK jobs on the WORKER node

Add a worker address to the slaves file

# A Spark Worker'll be started on each of the machines listes below Test88.hadoop test89.hadoop test90.ha  
Doop  
Test91.hadoop

Synchronizing profiles and Spark,scala to the entire cluster
Start Spark masterbin/start-master.sh

13/09/23 09:46:57 Info Slf4jeventhandler:slf4jeventhandler started  
13/09/23 09:46:57 info actorsystemimpl: remoteserverstarted@akka://sparkmaster@test85.hadoop:7077  
13/09/23 09:46:57 INFO master:starting Spark Master at spark://test85.hadoop:7077  
13/09/23 09:46:57 INFO ioworker:ioworker thread ' spray-io-worker-0 ' started  
13/09 /23 09:46:57 INFO Httpserver:akka://sparkmaster/user/httpserver started on/0.0.0.0:8088

Back to the column page: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.