Apache Spark 1.6 Hadoop 2.6 mac stand-alone installation configuration

Last Update:2017-03-14 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprint: http://www.cnblogs.com/ysisl/p/5979268.html

First, download the information

1. JDK 1.6 +

2. Scala 2.10.4

3. Hadoop 2.6.4

4. Spark 1.6

Second, pre-installed

1. Installing the JDK

2. Install Scala 2.10.4

Unzip the installation package to

3. Configure sshd

ssh-keygen-t dsa-p "-F ~/.SSH/ID_DSA

Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Mac starts sshd

sudo launchctl load-w/system/library/launchdaemons/ssh.plist

View Startup

sudo launchctl list | grep ssh

Output -0 com.openssh.sshd indicates a successful start

Stop sshd Service

sudo launchctl unload-w/system/library/launchdaemons/ssh.plist

Third, install Hadoop

1. Create a Hadoop file system directory

MKDIR-PV Hadoop/workspace

CD Hadoop/workspace

mkdir tmp

MKDIR-PV Hdfs/data

MKDIR-PV Hdfs/name

Add a Hadoop directory environment variable

VI ~/.BASHRC

hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4

Configure HADOOP, all under $hadoop_home/etc/hadoop

1. Core-site.xml

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/Users/ysisl/app/hadoop/workspace/tmp</value>

<description>namenode Temp dir</description>

</property>

</configuration>

2. Hdfs-site.xml

<value>/users/ysisl/app/hadoop/workspace/hdfs/name</value>

<description>namenode Storage of HDFs namespace metadata </description>

</property>

<value>/users/ysisl/app/hadoop/workspace/hdfs/data</value>

<physical storage location of data blocks on Description>datanode </description>

</property>

<name>dfs.replication</name>

<description> number of replicas, configured by default is 3, should be less than the number of Datanode machines </description>

</property>

</configuration>

3. Copy Mapred-site.xml.template to Mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

</configuration>

4. Yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.resourcemanager.webapp.address</name>

<value>localhost: 8099</value>

</property>

</configuration>

5. Format the HDFs file system

$HADOOP _home/bin/hdfs Namenode-format

6. Enter sbin/to execute start-all.sh

7. Perform JPS to see if it starts properly

21472

30256 Jps

29793 DataNode

29970 Secondarynamenode

29638 NameNode

30070 ResourceManager

30231 NodeManager

8. Open the http://localhost:50070/explorer.html Web page to view the Hadoop directory structure, indicating successful installation

Iv. installation of Spark

1. Unzip the spark compression pack

Tar xvzf spark.1.6.tar.gz

2. Adding environment variables

VI ~/.BASHRC

scala_home=/users/ysisl/app/spark/scala-2.10.4

spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6

2. Setting up the configuration file

CD spar-1.6.1-bin-hadoop2.6/conf

CP Spark-env.sh.template spark-env.sh

VI spar-env.sh

Add the following content

Export Java_home=/library/java/javavirtualmachines/jdk1.8.0_25.jdk/contents/home

Export scala_home=/users/ysisl/app/spark/scala-2.10.4

Export Hadoop_conf_dir=/users/ysisl/app/hadoop/hadoop-2.6.4/etc/hadoop

Export Spark_master_ip=localhost

Export spark_worker_cores=2

Export SPARK_WORKER_MEMORY=2G

CP Slaves.template Slaves

The default slaves now hosts a single

3. Start sbin/start-all.sh

JPS saw a master,worker process.

21472

29793 DataNode

29970 Secondarynamenode

30275 Master

30468 Sparksubmit

29638 NameNode

30070 ResourceManager

30231 NodeManager

30407 Worker

30586 Jps

4. Configure Scala, Spark, and Hadoop environment variables to join the path for easy execution

VI ~/.BASHRC

Export hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4

Export scala_home=/users/ysisl/app/spark/scala-2.10.4

Export spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6

Export path= "${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH"

Five. Test run

1. Prepare a CSV file, path /users/ysisl/app/hadoop/test.csv

2. View the Dfs file system structure, perform Hadoop FS-LSR/

3. New directory, Hadoop fs-mkdir/test

4. Uploading files to the directory, Hadoop fs-put/users/ysisl/app/hadoop/test.csv/test/

5. Hadoop FS-LSR/View the created catalog files

6. Executive Spark-shell

Scala > Val file=sc.textfile ("Hdfs:/test/test.csv")

Scala > Val count=file.flatmap (Line=>line.split ("")). Map (word=> (word,1)). Reducebykey (_+_)

Scala > Count.collect

7. View execution Status

A. localhost:8080 to see how the spark cluster is running. This port generally conflicts with other ports

Add export spark_master_webui_port=98080 to spark-env.sh to specify

B. http://localhost:4040/jobs/ , view spark task job run

C. http://localhost:50070/ Hadoop cluster operation

Apache Spark 1.6 Hadoop 2.6 mac stand-alone installation configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More