First, download the information
1. JDK 1.6 +
2. Scala 2.10.4
3. Hadoop 2.6.4
4. Spark 1.6
Second, pre-installed
1. Installing the JDK
2. Install Scala 2.10.4
Unzip the installation package to
3. Configure sshd
ssh-keygen-t dsa-p "-F ~/.SSH/ID_DSA
Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Mac starts sshd
sudo launchctl load-w/system/library/launchdaemons/ssh.plist
View Startup
sudo launchctl list | grep ssh
Output -0 com.openssh.sshd indicates a successful start
Stop sshd Service
sudo launchctl unload-w/system/library/launchdaemons/ssh.plist
Third, install Hadoop
1. Create a Hadoop file system directory
MKDIR-PV Hadoop/workspace
CD Hadoop/workspace
mkdir tmp
MKDIR-PV Hdfs/data
MKDIR-PV Hdfs/name
Add a Hadoop directory environment variable
VI ~/.BASHRC
hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4
Configure HADOOP, all under $hadoop_home/etc/hadoop
1. Core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>hdfs uri</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/ysisl/app/hadoop/workspace/tmp</value>
<description>namenode Temp dir</description>
</property>
</configuration>
2. Hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/users/ysisl/app/hadoop/workspace/hdfs/name</value>
<description>namenode Storage of HDFs namespace metadata </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/users/ysisl/app/hadoop/workspace/hdfs/data</value>
<physical storage location of data blocks on Description>datanode </description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description> number of replicas, configured by default is 3, should be less than the number of Datanode machines </description>
</property>
</configuration>
3. Copy Mapred-site.xml.template to Mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4. Yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost: 8099</value>
</property>
</configuration>
5. Format the HDFs file system
$HADOOP _home/bin/hdfs Namenode-format
6. Enter sbin/to execute start-all.sh
7. Perform JPS to see if it starts properly
21472
30256 Jps
29793 DataNode
29970 Secondarynamenode
29638 NameNode
30070 ResourceManager
30231 NodeManager
8. Open the http://localhost:50070/explorer.html Web page to view the Hadoop directory structure, indicating successful installation
Iv. installation of Spark
1. Unzip the spark compression pack
Tar xvzf spark.1.6.tar.gz
2. Adding environment variables
VI ~/.BASHRC
scala_home=/users/ysisl/app/spark/scala-2.10.4
spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6
2. Setting up the configuration file
CD spar-1.6.1-bin-hadoop2.6/conf
CP Spark-env.sh.template spark-env.sh
VI spar-env.sh
Add the following content
Export Java_home=/library/java/javavirtualmachines/jdk1.8.0_25.jdk/contents/home
Export scala_home=/users/ysisl/app/spark/scala-2.10.4
Export Hadoop_conf_dir=/users/ysisl/app/hadoop/hadoop-2.6.4/etc/hadoop
Export Spark_master_ip=localhost
Export spark_worker_cores=2
Export SPARK_WORKER_MEMORY=2G
CP Slaves.template Slaves
The default slaves now hosts a single
3. Start sbin/start-all.sh
JPS saw a master,worker process.
21472
29793 DataNode
29970 Secondarynamenode
30275 Master
30468 Sparksubmit
29638 NameNode
30070 ResourceManager
30231 NodeManager
30407 Worker
30586 Jps
4. Configure Scala, Spark, and Hadoop environment variables to join the path for easy execution
VI ~/.BASHRC
Export hadoop_home=/users/ysisl/app/hadoop/hadoop-2.6.4
Export scala_home=/users/ysisl/app/spark/scala-2.10.4
Export spark_home=/users/ysisl/app/spark/spark-1.6.1-bin-hadoop2.6
Export path= "${hadoop_home}/bin:${scala_home}/bin:${spark_home}/bin: $PATH"
Five. Test run
1. Prepare a CSV file, path /users/ysisl/app/hadoop/test.csv
2. View the Dfs file system structure, perform Hadoop FS-LSR/
3. New directory, Hadoop fs-mkdir/test
4. Uploading files to the directory, Hadoop fs-put/users/ysisl/app/hadoop/test.csv/test/
5. Hadoop FS-LSR/View the created catalog files
6. Executive Spark-shell
Scala > Val file=sc.textfile ("Hdfs:/test/test.csv")
Scala > Val count=file.flatmap (Line=>line.split ("")). Map (word=> (word,1)). Reducebykey (_+_)
Scala > Count.collect
7. View execution Status
A. localhost:8080 to see how the spark cluster is running. This port generally conflicts with other ports
Add export spark_master_webui_port=98080 to spark-env.sh to specify
B. http://localhost:4040/jobs/ , view spark task job run
C. http://localhost:50070/ Hadoop cluster operation
Apache Spark 1.6 Hadoop 2.6 mac stand-alone installation configuration