Spark1.6.0 on Hadoop-2.6.3 installation configuration
1. Configure Hadoop
(1), download Hadoop
Mkdir/usr/local/bigdata/hadoop Cd/usr/local/bigdata/hadoop wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz Tar zxvf hadoop-2.6.3.tar.gz |
(2), Configure the Hadoop environment variables
Export hadoop_home=/usr/local/bigdata/hadoop/hadoop-2.6.3 Export Path=${java_home}/bin:${hadoop_home}/bin |
2. installation configuration Scala
(1), download Scala
Mkdir/usr/local/bigdata/scala wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz Tar zxvf scala-2.10.4.tgz |
(2), Configure Scala environment variables
Export scala_home=/usr/local/bigdata/scala/scala-2.10.4 Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin: $PATH |
Display the installed Scala version
(3), test the Scala operating environment
Enter Scala into the Scala environment: Test: 12*12 Enter |
3. installation configuration Spark1.6.0
(1), download Spark1.6.0
Download spark according to the corresponding version of Hadoop select Download URL: http://spark.apache.org/downloads.html
Mkdir/usr/local/bigdata/spark wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz Tar zxvf spark-1.6.0.tgz |
(2), Configuring the Spark environment variable
Export spark_home=/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6 Export Path=${java_home}/bin:${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH |
(3), configure Spark
Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/conf CP Spark-env.sh.template spark-env.sh Vim spark-env.sh #添加SPARK配置信息 Export java_home=/usr/java/jdk1.8.0_71 Export scala_home=/usr/local/bigdata/scala/scala-2.10.4 Export spark_master_ip=xtyfb-csj06 Export spark_worker_cores=2 Export SPARK_WORKER_MEMORY=1G Export Hadoop_conf_dir=/usr/local/bigdata/hadoop/hadoop-2.6.3/etc/hadoop |
CP Slaves.template Slaves Vim Slaves #添加节点 xtyfb-csj06 or 127.0.1.1 |
4. start spark to see the cluster status
Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/sbin Start: ./start-all.sh |
JPS View process: One more master and worker process
To view the details of a process using JPS-MLV
You can see the access address of the master frontend http://172.16.80.226:8080/
Access address for worker Frontend http://172.16.80.226:8081/
Switch to Cd/usr/local/bigdata/spark/spark-1.6.0-bin-hadoop2.6/bin
Start: Spark-shell
Mkdir/usr/local/bigdata/spark/testdata
Vim/usr/local/bigdata/spark/testdata/wcdemo1.txt
Spark Hive Spark Hive Hive Redis HDDs Redis |
Execute Scala's script command to get the result of the word count:
Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Collect |
Print statistics results: rdd:array[(String, Int)] = Array ((hive,3), (spark,2), (hdds,1), (redis,2))
Other examples: Sorting results in ascending order
Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (). Collect |
Execution Result: rdd:array[(String, Int)] = Array ((hdds,1), (hive,3), (redis,2), (spark,2))
Sort results in descending order
Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Collect |
Execution Result: rdd:array[(String, Int)] = Array ((spark,2), (redis,2), (hive,3), (hdds,1))
Number of rows of statistical results
Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (false). Count |
Execution Result: Rdd:long = 4
Save the results
Val rdd=sc.textfile ("/usr/local/bigdata/spark/testdata/wcdemo1.txt"). FlatMap (_.split ("\ T")). Map (x=> (x,1)). Reducebykey (_+_). Sortbykey (False). Saveastextfile ("/usr/local/bigdata/spark/testdata/wcdemo_out") |
For WC, the word is parsed from each line of the input data, then the same word is placed in a bucket, and the frequency at which each word appears in each bucket is counted.
The Flatmap function converts a record into multiple records (a one-to-many relationship), the map function converts one record to another record (single-to-one relationship), and the Reducebykey function divides the data of the same key into a bucket and calculates it in groups of key units.
After a series of operation of the RDD conversion operator, before are transformation operators, the last collect, Saveastextfile, Count are the actions operator.
View the results below screenshot