Spark1.6.0 on Hadoop-2.6.3 installation configuration

Last Update:2018-07-25 Source: Internet

Author: User

Tags mkdir redis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark1.6.0 on Hadoop-2.6.3 installation configuration 1. Configure Hadoop

(1), download Hadoop

Mkdir/usr/local/bigdata/hadoop

Cd/usr/local/bigdata/hadoop

wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz

Tar zxvf hadoop-2.6.3.tar.gz

(2), Configure the Hadoop environment variables

Export hadoop_home=/usr/local/bigdata/hadoop/hadoop-2.6.3

Export Path=${java_home}/bin:${hadoop_home}/bin

2. installation configuration Scala

(1), download Scala

Mkdir/usr/local/bigdata/scala

wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

Tar zxvf scala-2.10.4.tgz

(2), Configure Scala environment variables

Export scala_home=/usr/local/bigdata/scala/scala-2.10.4

Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin: $PATH

Display the installed Scala version

(3), test the Scala operating environment

Enter Scala into the Scala environment:

Test: 12*12 Enter

3. installation configuration Spark1.6.0

(1), download Spark1.6.0

Download spark according to the corresponding version of Hadoop select Download URL: http://spark.apache.org/downloads.html

Mkdir/usr/local/bigdata/spark

wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz

Tar zxvf spark-1.6.0.tgz

(2), Configuring the Spark environment variable

Export spark_home=/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6

Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH

(3), configure Spark

Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/conf

CP Spark-env.sh.template spark-env.sh

Vim spark-env.sh #添加SPARK配置信息

Export java_home=/usr/java/jdk1.8.0_71

Export scala_home=/usr/local/bigdata/scala/scala-2.10.4

Export spark_master_ip=xtyfb-csj06

Export spark_worker_cores=2

Export SPARK_WORKER_MEMORY=1G

Export Hadoop_conf_dir=/usr/local/bigdata/hadoop/hadoop-2.6.3/etc/hadoop

CP Slaves.template Slaves

Vim Slaves #添加节点

xtyfb-csj06 or 127.0.1.1

4. start spark to see the cluster status

Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/sbin

Start:

./start-all.sh

JPS View process: One more master and worker process

To view the details of a process using JPS-MLV

You can see the access address of the master frontend http://172.16.80.226:8080/

Access address for worker Frontend http://172.16.80.226:8081/

Switch to Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/bin

Start: Spark-shell

Mkdir/usr/local/bigdata/spark/testdata

Vim/usr/local/bigdata/spark/testdata/wcdemo1.txt

Spark Hive

Hive Redis

HDDs Redis

Execute Scala's script command to get the result of the word count:

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Collect

Print statistics results: rdd:array[(String, Int)] = Array ((hive,3), (spark,2), (hdds,1), (redis,2))

Other examples: Sorting results in ascending order

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (). Collect

Execution Result: rdd:array[(String, Int)] = Array ((hdds,1), (hive,3), (redis,2), (spark,2))

Sort results in descending order

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Collect

Execution Result: rdd:array[(String, Int)] = Array ((spark,2), (redis,2), (hive,3), (hdds,1))

Number of rows of statistical results

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Count

Execution Result: Rdd:long = 4

Save the results

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (False). Saveastextfile ("/usr/local/bigdata/spark/testdata/wcdemo_out")

For WC, the word is parsed from each line of the input data, then the same word is placed in a bucket, and the frequency at which each word appears in each bucket is counted.

The Flatmap function converts a record into multiple records (a one-to-many relationship), the map function converts one record to another record (single-to-one relationship), and the Reducebykey function divides the data of the same key into a bucket and calculates it in groups of key units.

After a series of operation of the RDD conversion operator, before are transformation operators, the last collect, Saveastextfile, Count are the actions operator.

View the results below screenshot

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More