Spark Pseudo-distributed installation (dependent on Hadoop)

Source: Internet
Author: User
Tags create directory ssh
First, pseudo-distribution installation Spark installation environment: Ubuntu 14.04 LTS 64-bit +hadoop2.7.2+spark2.0.0+jdk1.7.0_76 Linux third-party software should be installed in the/OPT directory, the Convention is better than the configuration, Following this principle is a good environment to configure the habit. So the software installed here is in the/OPT directory. 1, install jdk1.7 (1) Download jdk-7u76-linux-x64.tar.gz; (2) Unzip the jdk-7u76-linux-x64.tar.gz and move it to the/OPT/JAVA/JDK path (self-built); command: TAR-ZXVF jdk-7u76-linux-x64.tar.gz sudo mv jdk1.7.0_76/opt/java/jdk/(3) Configuring the Java Environment variable command: sudo gedit/etc/profile open profile, append: # Set Java ENV export java_home=/opt/java/jdk/jdk1.7.0_76 export JRE_HOME=${JAVA_HOME}/JRE export Classpath=.:${java_ Home}/lib:${jre_home}/lib Export Path=${java_home}/bin: $PATH (4) Verify that the installation is successful as follows:

Special note: Before the root user installed the JDK, and then switch to Hadoop user execution java-version on the error, the final troubleshooting is because the Java environment variable is configured in ~/.BASHRC, reconfigured to/etc/profile, the problem is resolved. 2, install and configure the SSH Two ways: (1) Online installation (the way this article), when the network adapter is "NAT" when the installation command: sudo apt-get install openssh-server authentication: Execute ssh localhost can log in indicates the installation is successful.

(2) Offline installation (Baidu Bar, O (∩_∩) o) 3, ssh password-free login execution command to generate a key pair ssh-keygen-t rsa-p ", and then enter it to generate the. SSH directory under/home/mahao. SSH with Id_rsa and id_ Rsa.pub. Cat/home/mahao/.ssh/id_rsa.pub >>/home/mahao/.ssh//authorized_keys authentication: SSH localhost is executed again, you can log in without a password. 4. Installing and configuring Hadoop because this is a pseudo-distributed installation of Spark, it's natural to do a pseudo-distributed installation of Hadoop (1) Install Hadoop to unzip the hadoop-2.7.2.tar.gz to the/opt/hadoop path (self-built); (2) Configure HADOOP to configure files in the same location under {hadoop_home}/etc/hadoop path, my path is/opt/hadoop/hadoop-2.7.2/etc/hadoop Modify the hadoop-env.sh file, mainly set up Java_home, in addition, according to the official website also add a hadoop_prefix export variable, append content: Export java_home=/opt/java/jdk/jdk1.7.0_ hadoop_prefix=/opt/hadoop/hadoop-2.7.2 Export Modify Core-site.xml file, additional content: <configuration> <property> < name>fs.defaultfs</name> <value>hdfs://localhost:9000</value> </property> <property > <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoop-2.7.2/hadooptmp</value> < Description>abase for other temporary directories.</description> </property> </configuration> The above haThe dooptmp directory needs to be created by itself. Modify Hdfs-site.xml file, append content: <configuration> <property> <name>dfs.replication</name> <value >1</value> </property> </configuration> Modify Yarn-site.xml file (not configured under pseudo-distribution), append content: <configuration > <property> <name>mapreduce.framework.name</name> <value>yarn</value> </ property> </configuration> Modify Yarn-site.xml file, additional content: <configuration> <property> <name> Yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> Modify yarn-env.sh File: Export java_home=/opt/java/jdk/jdk1.7.0_76 5, Format Namenode, and start Hadoop (1) Formatting Namenode

(2) Start Dfs,yarn (start-all.sh has been abolished, start with the following command)


Start-all.sh has been revoked, start with the above command, otherwise the following error will occur:

Then use the JPS command to view the Java process, if there are several processes that indicate the installation and startup success:

You can also view the reports for HDFs with the following command:

Web Management Interface View: http://localhost:50070/

http://localhost:8088/

6. Install and configure Spark2.0.0 (1) Install Scala Spark2.0.0 requirements scala2.11, download scala2.11.8, download address: http://www.scala-lang.org/download/2.11.8. HTML (2) Install and configure Scala: Unzip Scala to/opt/scala (self-built) configuration scala_home environment variable to/etc/profile export hadoop_home=/opt/hadoop/ hadoop-2.7.2 Export scala_home=/opt/scala/scala-2.11.8 export path=${scala_home}/bin:${java_home}/bin: $PATH
Use Source/etc/profile to make it effective. Validate Scala:

(3) Download the spark2.0.0 and unzip the spark-2.0.0-bin-hadoop2.7.tgz to/opt/spark path to configure the SPARK_HOME environment variable export spark_home=/opt/spark/ spark-2.0.0-bin-hadoop2.7 use Source/etc/profile to make it effective. Modify the slaves file in the Spark Conf directory, back up and rename the CP Slaves.template slaves before modifying the slaves file to host name, I am Ubuntu, as follows:

Modify the spark-env.sh file, back up and rename the CP Spark-env.sh.tempalte spark-env.sh before you modify it, and then open the spark-env.sh file, append content: Export java_home=/opt/ java/jdk/jdk1.7.0_76 Export hadoop_home=/opt/hadoop/hadoop-2.7.2 Export scala_home=/opt/scala/scala-2.11.8 Export Hadoop_conf_dir=/opt/hadoop/hadoop-2.7.2/etc/hadoop Export Spark_master_ip=ubuntu Export spark_worker_memory=512m (4) Start spark pseudo-distribution, the first step, before starting to ensure that the Hadoop pseudo-distribution started successfully, first use JPS to see the process information:

Indicates that Hadoop started successfully. If not started, go to the Sbin directory of Hadoop to execute./start-all.sh BOOT. Second, start Spark: Execute start-all.sh start spark in the Sbin directory of Spark, JPS view the latest process after startup:

To access http://localhost:8080, go to the Web console page of Spark:

A worker node information can be seen from the page. Enter the Spark-shell Web console page by accessing http://localhost:4040 (use the command first./bin/spark-shell start Sparkcontext), the following Web interface information appears:

If more than one sparkcontext is running on a machine, its web port will automatically add one, such as 4041,4042,4043. To browse the persistent event log, set the park.eventLog.enabled.
Turn off spark into the spark catalog and execute:./sbin/stop-all.sh off Hadoop into the Hadoop directory, execute:./sbin/stop-dfs.sh./sbin/stop-yarn.sh (./sbin/ Stop-all.sh can also, but there will be a hint that the command has been discarded, using the above two commands instead of the Port collation: Master port is 7077 Master WebUI is 8080 spark shell WebUI Port is 4040

If it appears: mkdir: Unable to create directory "/opt/spark-2.2.0-bin-.2.0-bin-hadoop2.7/logs": Insufficient permissions,

Then modify the permissions:

sudo chown-r marho:marho/opt/spark-2.2.0-bin-hadoop2.7



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.