Apache Spark 1.6 Hadoop 2.6 mac stand-alone installation configuration

Source: Internet
Author: User
Tags hadoop fs

Reprint: http://www.cnblogs.com/ysisl/p/5979268.html

First, download the information

1. JDK 1.6 +

2. Scala 2.10.4

3. Hadoop 2.6.4

4. Spark 1.6

Second, pre-installed

1. Installing the JDK

2. Install Scala 2.10.4

Unzip the installation package to

3. Configure sshd

ssh-keygen-t dsa-p "-F ~/.SSH/ID_DSA

Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Mac starts sshd

sudo launchctl load-w/system/library/launchdaemons/ssh.plist

View Startup

sudo launchctl list | grep ssh

Output -0 com.openssh.sshd indicates a successful start

Stop sshd Service

sudo launchctl unload-w/system/library/launchdaemons/ssh.plist

Third, install Hadoop

1. Create a Hadoop file system directory

MKDIR-PV Hadoop/workspace

CD Hadoop/workspace

mkdir tmp

MKDIR-PV Hdfs/data

MKDIR-PV Hdfs/name

Add a Hadoop directory environment variable

VI ~/.BASHRC

hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4

Configure HADOOP, all under $hadoop_home/etc/hadoop

1. Core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

<description>hdfs uri</description>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/Users/ysisl/app/hadoop/workspace/tmp</value>

<description>namenode Temp dir</description>

</property>

</configuration>

2. Hdfs-site.xml

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/users/ysisl/app/hadoop/workspace/hdfs/name</value>

<description>namenode Storage of HDFs namespace metadata </description>

</property>

<property>

<name>dfs.data.dir</name>

<value>/users/ysisl/app/hadoop/workspace/hdfs/data</value>

<physical storage location of data blocks on Description>datanode </description>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

<description> number of replicas, configured by default is 3, should be less than the number of Datanode machines </description>

</property>

</configuration>

3. Copy Mapred-site.xml.template to Mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

4. Yarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>localhost: 8099</value>

</property>

</configuration>

5. Format the HDFs file system

$HADOOP _home/bin/hdfs Namenode-format

6. Enter sbin/to execute start-all.sh

7. Perform JPS to see if it starts properly

21472

30256 Jps

29793 DataNode

29970 Secondarynamenode

29638 NameNode

30070 ResourceManager

30231 NodeManager

8. Open the http://localhost:50070/explorer.html Web page to view the Hadoop directory structure, indicating successful installation

Iv. installation of Spark

1. Unzip the spark compression pack

Tar xvzf spark.1.6.tar.gz

2. Adding environment variables

VI ~/.BASHRC

scala_home=/users/ysisl/app/spark/scala-2.10.4

spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6

2. Setting up the configuration file

CD spar-1.6.1-bin-hadoop2.6/conf

CP Spark-env.sh.template spark-env.sh

VI spar-env.sh

Add the following content

Export Java_home=/library/java/javavirtualmachines/jdk1.8.0_25.jdk/contents/home

Export scala_home=/users/ysisl/app/spark/scala-2.10.4

Export Hadoop_conf_dir=/users/ysisl/app/hadoop/hadoop-2.6.4/etc/hadoop

Export Spark_master_ip=localhost

Export spark_worker_cores=2

Export SPARK_WORKER_MEMORY=2G

CP Slaves.template Slaves

The default slaves now hosts a single

3. Start sbin/start-all.sh

JPS saw a master,worker process.

21472

29793 DataNode

29970 Secondarynamenode

30275 Master

30468 Sparksubmit

29638 NameNode

30070 ResourceManager

30231 NodeManager

30407 Worker

30586 Jps

4. Configure Scala, Spark, and Hadoop environment variables to join the path for easy execution

VI ~/.BASHRC

Export hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4

Export scala_home=/users/ysisl/app/spark/scala-2.10.4

Export spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6

Export path= "${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH"

Five. Test run

1. Prepare a CSV file, path /users/ysisl/app/hadoop/test.csv

2. View the Dfs file system structure, perform Hadoop FS-LSR/

3. New directory, Hadoop fs-mkdir/test

4. Uploading files to the directory, Hadoop fs-put/users/ysisl/app/hadoop/test.csv/test/

5. Hadoop FS-LSR/View the created catalog files

6. Executive Spark-shell

Scala > Val file=sc.textfile ("Hdfs:/test/test.csv")

Scala > Val count=file.flatmap (Line=>line.split ("")). Map (word=> (word,1)). Reducebykey (_+_)

Scala > Count.collect

7. View execution Status

A. localhost:8080 to see how the spark cluster is running. This port generally conflicts with other ports

Add export spark_master_webui_port=98080 to spark-env.sh to specify

B. http://localhost:4040/jobs/ , view spark task job run

C. http://localhost:50070/ Hadoop cluster operation

Apache Spark 1.6 Hadoop 2.6 mac stand-alone installation configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.