First, pseudo-distribution installation Spark installation environment: Ubuntu 14.04 LTS 64-bit +hadoop2.7.2+spark2.0.0+jdk1.7.0_76 Linux third-party software should be installed in the/OPT directory, the Convention is better than the configuration, Following this principle is a good environment to configure the habit. So the software installed here is in the/OPT directory. 1, install jdk1.7 (1) Download jdk-7u76-linux-x64.tar.gz; (2) Unzip the jdk-7u76-linux-x64.tar.gz and move it to the/OPT/JAVA/JDK path (self-built); command: TAR-ZXVF jdk-7u76-linux-x64.tar.gz sudo mv jdk1.7.0_76/opt/java/jdk/(3) Configuring the Java Environment variable command: sudo gedit/etc/profile open profile, append: # Set Java ENV export java_home=/opt/java/jdk/jdk1.7.0_76 export JRE_HOME=${JAVA_HOME}/JRE export Classpath=.:${java_ Home}/lib:${jre_home}/lib Export Path=${java_home}/bin: $PATH (4) Verify that the installation is successful as follows:
Special note: Before the root user installed the JDK, and then switch to Hadoop user execution java-version on the error, the final troubleshooting is because the Java environment variable is configured in ~/.BASHRC, reconfigured to/etc/profile, the problem is resolved. 2, install and configure the SSH Two ways: (1) Online installation (the way this article), when the network adapter is "NAT" when the installation command: sudo apt-get install openssh-server authentication: Execute ssh localhost can log in indicates the installation is successful.
(2) Offline installation (Baidu Bar, O (∩_∩) o) 3, ssh password-free login execution command to generate a key pair ssh-keygen-t rsa-p ", and then enter it to generate the. SSH directory under/home/mahao. SSH with Id_rsa and id_ Rsa.pub. Cat/home/mahao/.ssh/id_rsa.pub >>/home/mahao/.ssh//authorized_keys authentication: SSH localhost is executed again, you can log in without a password. 4. Installing and configuring Hadoop because this is a pseudo-distributed installation of Spark, it's natural to do a pseudo-distributed installation of Hadoop (1) Install Hadoop to unzip the hadoop-2.7.2.tar.gz to the/opt/hadoop path (self-built); (2) Configure HADOOP to configure files in the same location under {hadoop_home}/etc/hadoop path, my path is/opt/hadoop/hadoop-2.7.2/etc/hadoop Modify the hadoop-env.sh file, mainly set up Java_home, in addition, according to the official website also add a hadoop_prefix export variable, append content: Export java_home=/opt/java/jdk/jdk1.7.0_ hadoop_prefix=/opt/hadoop/hadoop-2.7.2 Export Modify Core-site.xml file, additional content: <configuration> <property> < name>fs.defaultfs</name> <value>hdfs://localhost:9000</value> </property> <property > <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoop-2.7.2/hadooptmp</value> < Description>abase for other temporary directories.</description> </property> </configuration> The above haThe dooptmp directory needs to be created by itself. Modify Hdfs-site.xml file, append content: <configuration> <property> <name>dfs.replication</name> <value >1</value> </property> </configuration> Modify Yarn-site.xml file (not configured under pseudo-distribution), append content: <configuration > <property> <name>mapreduce.framework.name</name> <value>yarn</value> </ property> </configuration> Modify Yarn-site.xml file, additional content: <configuration> <property> <name> Yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> Modify yarn-env.sh File: Export java_home=/opt/java/jdk/jdk1.7.0_76 5, Format Namenode, and start Hadoop (1) Formatting Namenode
(2) Start Dfs,yarn (start-all.sh has been abolished, start with the following command)
Start-all.sh has been revoked, start with the above command, otherwise the following error will occur:
Then use the JPS command to view the Java process, if there are several processes that indicate the installation and startup success:
You can also view the reports for HDFs with the following command:
Web Management Interface View: http://localhost:50070/
http://localhost:8088/
6. Install and configure Spark2.0.0 (1) Install Scala Spark2.0.0 requirements scala2.11, download scala2.11.8, download address: http://www.scala-lang.org/download/2.11.8. HTML (2) Install and configure Scala: Unzip Scala to/opt/scala (self-built) configuration scala_home environment variable to/etc/profile export hadoop_home=/opt/hadoop/ hadoop-2.7.2 Export scala_home=/opt/scala/scala-2.11.8 export path=${scala_home}/bin:${java_home}/bin: $PATH
Use Source/etc/profile to make it effective. Validate Scala:
(3) Download the spark2.0.0 and unzip the spark-2.0.0-bin-hadoop2.7.tgz to/opt/spark path to configure the SPARK_HOME environment variable export spark_home=/opt/spark/ spark-2.0.0-bin-hadoop2.7 use Source/etc/profile to make it effective. Modify the slaves file in the Spark Conf directory, back up and rename the CP Slaves.template slaves before modifying the slaves file to host name, I am Ubuntu, as follows:
Modify the spark-env.sh file, back up and rename the CP Spark-env.sh.tempalte spark-env.sh before you modify it, and then open the spark-env.sh file, append content: Export java_home=/opt/ java/jdk/jdk1.7.0_76 Export hadoop_home=/opt/hadoop/hadoop-2.7.2 Export scala_home=/opt/scala/scala-2.11.8 Export Hadoop_conf_dir=/opt/hadoop/hadoop-2.7.2/etc/hadoop Export Spark_master_ip=ubuntu Export spark_worker_memory=512m (4) Start spark pseudo-distribution, the first step, before starting to ensure that the Hadoop pseudo-distribution started successfully, first use JPS to see the process information:
Indicates that Hadoop started successfully. If not started, go to the Sbin directory of Hadoop to execute./start-all.sh BOOT. Second, start Spark: Execute start-all.sh start spark in the Sbin directory of Spark, JPS view the latest process after startup:
To access http://localhost:8080, go to the Web console page of Spark:
A worker node information can be seen from the page. Enter the Spark-shell Web console page by accessing http://localhost:4040 (use the command first./bin/spark-shell start Sparkcontext), the following Web interface information appears:
If more than one sparkcontext is running on a machine, its web port will automatically add one, such as 4041,4042,4043. To browse the persistent event log, set the park.eventLog.enabled.
Turn off spark into the spark catalog and execute:./sbin/stop-all.sh off Hadoop into the Hadoop directory, execute:./sbin/stop-dfs.sh./sbin/stop-yarn.sh (./sbin/ Stop-all.sh can also, but there will be a hint that the command has been discarded, using the above two commands instead of the Port collation: Master port is 7077 Master WebUI is 8080 spark shell WebUI Port is 4040
If it appears: mkdir: Unable to create directory "/opt/spark-2.2.0-bin-.2.0-bin-hadoop2.7/logs": Insufficient permissions,
Then modify the permissions:
sudo chown-r marho:marho/opt/spark-2.2.0-bin-hadoop2.7