Spark1.6.0 on Hadoop-2.6.3 installation configuration

Source: Internet
Author: User
Tags mkdir redis
Spark1.6.0 on Hadoop-2.6.3 installation configuration 1. Configure Hadoop

(1), download Hadoop

Mkdir/usr/local/bigdata/hadoop

Cd/usr/local/bigdata/hadoop

wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz

Tar zxvf hadoop-2.6.3.tar.gz

(2), Configure the Hadoop environment variables

Export hadoop_home=/usr/local/bigdata/hadoop/hadoop-2.6.3

Export Path=${java_home}/bin:${hadoop_home}/bin

2. installation configuration Scala

(1), download Scala

Mkdir/usr/local/bigdata/scala

wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

Tar zxvf scala-2.10.4.tgz

(2), Configure Scala environment variables

Export scala_home=/usr/local/bigdata/scala/scala-2.10.4

Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin: $PATH

Display the installed Scala version

(3), test the Scala operating environment

Enter Scala into the Scala environment:

Test: 12*12 Enter

3. installation configuration Spark1.6.0

(1), download Spark1.6.0

Download spark according to the corresponding version of Hadoop select Download URL: http://spark.apache.org/downloads.html

Mkdir/usr/local/bigdata/spark

wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz

Tar zxvf spark-1.6.0.tgz

(2), Configuring the Spark environment variable

Export spark_home=/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6

Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH

(3), configure Spark

Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/conf

CP Spark-env.sh.template spark-env.sh

Vim spark-env.sh #添加SPARK配置信息

Export java_home=/usr/java/jdk1.8.0_71

Export scala_home=/usr/local/bigdata/scala/scala-2.10.4

Export spark_master_ip=xtyfb-csj06

Export spark_worker_cores=2

Export SPARK_WORKER_MEMORY=1G

Export Hadoop_conf_dir=/usr/local/bigdata/hadoop/hadoop-2.6.3/etc/hadoop

CP Slaves.template Slaves

Vim Slaves #添加节点

xtyfb-csj06 or 127.0.1.1

 

4. start spark to see the cluster status

Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/sbin

Start:

./start-all.sh

JPS View process: One more master and worker process

To view the details of a process using JPS-MLV

You can see the access address of the master frontend http://172.16.80.226:8080/

Access address for worker Frontend http://172.16.80.226:8081/

Switch to Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/bin

Start: Spark-shell

Mkdir/usr/local/bigdata/spark/testdata

Vim/usr/local/bigdata/spark/testdata/wcdemo1.txt

Spark Hive

Spark Hive

Hive Redis

HDDs Redis

Execute Scala's script command to get the result of the word count:

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Collect

Print statistics results: rdd:array[(String, Int)] = Array ((hive,3), (spark,2), (hdds,1), (redis,2))

Other examples: Sorting results in ascending order

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (). Collect

Execution Result: rdd:array[(String, Int)] = Array ((hdds,1), (hive,3), (redis,2), (spark,2))

Sort results in descending order

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Collect

Execution Result: rdd:array[(String, Int)] = Array ((spark,2), (redis,2), (hive,3), (hdds,1))

Number of rows of statistical results

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Count

Execution Result: Rdd:long = 4

Save the results

Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (False). Saveastextfile ("/usr/local/bigdata/spark/testdata/wcdemo_out")

For WC, the word is parsed from each line of the input data, then the same word is placed in a bucket, and the frequency at which each word appears in each bucket is counted.

The Flatmap function converts a record into multiple records (a one-to-many relationship), the map function converts one record to another record (single-to-one relationship), and the Reducebykey function divides the data of the same key into a bucket and calculates it in groups of key units.

After a series of operation of the RDD conversion operator, before are transformation operators, the last collect, Saveastextfile, Count are the actions operator.

View the results below screenshot

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.