Spark-1.4.0 single-machine deployment (Hadoop-2.6.0 with pseudo-distributed) "measured"

Source: Internet
Author: User

?? At present, there is only one machine, the first to practice the hand (no software installed on the server) try Spark's stand-alone deployment.

?? Several parameters:
?? jdk-1.7+
?? Hadoop-2.6.0 (pseudo-distributed);
?? Scala-2.10.5;
?? Spark-1.4.0;
?? Here are the specific configuration procedures

  • Install jdk 1.7+
    Download URL http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    • Environment variable settings (preferably not with OPENJDK):

      export  java_home=/usr/java/java-1.7 . 0  _71export  jre_home=  $JAVA _home /jreexport  path= $PATH :  $JAVA _home /binexport  classpath=.: $JAVA _home /lib/dt.jar: $JAVA _ HOME /lib/tools.jar  
    • Update restart environment variables
      $ source/etc/profile

    • test br> $ java-version
  • Download and install scala-2.10.5
    "Download URL" http://www.scala-lang.org/download/2.10.5.html
    Download the corresponding compression package for the pair

    $ tar -zxf scala-2.10.5.tgz$ sudo mv scala-2.10.5 /usr/local

    To configure environment variables:
    export SCALA_HOME=/usr/local/scala-2.11.4
    export PATH=$SCALA_HOME/bin:$PATH
    Update Startup environment variables
    source /etc/profile
    Test the success of a Scala installation
    $ scala -version

  • "pro-Test" installation of Hadoop (manual compilation if Hadoop is required)
    "Install a reference URL for Hadoop" http://qindongliang.iteye.com/blog/2222145

    • Installation dependencies
      sudo yum install -y autoconf automake libtool git gcc gcc-c++ make cmake openssl-devel,ncurses-devel bzip2-devel
    • Install maven3.0+
      Download URL http://archive.apache.org/dist/maven/maven-3/3.0.5/binaries/

      • unzip
        TAR-XVF apache-maven-3.0.5-bin.tar.gz
      • Move File
        mv-rf apache-maven-3.0.5/usr/local/
      • Configure environment variables

        < Code class= "Hljs bash" >maven_home=/usr/local/apache-maven-3.0 . 5  export  maven_homeexport  path=${path} : ${maven_home} /bin  
      • Make effective
        source/etc/profile

      • Check for successful installation mvn-v
    • Install ant1.8+
      Download URL http://archive.apache.org/dist/ant/binaries/

      • Environment variable

        export  ant_home=/usr/local/apache-ant-< Span class= "Hljs-number" >1.8 . 4  export  Path= $ANT _home /bin: $PATH   
      • Test
        ant-version

    • Installing protobuf-2.5.0.tar.gz

      • Extract
        tar xvf protobuf-2.5.0.tar.gz
      • Installation
        cd protobuf-2.5.0
        ./configure --prefix=/usr/local/protobuf
        make
        nake install
      • Environment variables

        PATH=$PATH:/usr/local/protobuf/binLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/protobuf/lib
      • Test
        protoc --version
        If the output libprotoc 2.5.0 indicates a successful installation.

    • Install snappy1.1.0.tar.gz (optional option, this step is required if you need to compile the Hadoop support snappy compression)
      • Installation
        ./configure --prefix=/usr/local/snappy#指定的一个安装目录
        make
        make install
    • Install hadoop-snappy
      • git
        git clone https://github.com/electrum/hadoop-snappy.git
      • li> packaging
        After download is complete
        CD hadoop-snappy
        Execute maven Package command
        mvn package-dsnappy.prefix=/home/search/ Snappy
      • validation

        ? This directory is the compiled snappy local library, in
        hadoop-snappy/target/ Hadoop-snappy-0.0.1-snapshot-tar/hadoop-snappy-0.0.1-snapshot/lib Directory, there is a Hadoop-snappy-0.0.1-snapshot.jar , after HADOOP is compiled, it needs to be copied to the $HADOOP _home/lib directory.
        ? Remarks the package used throughout the process is placed under/root/.
    • Installing Hadoop
      • Install (download hadoop-2.6.0-src.tar.gz is the source of Hadoop)
        "Download URL" http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/
        "Can also be obtained directly"wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.1-src.tar.gz
      • Extract
        tar -zxvf hadoop-2.6.0-cdh5.4.1-src.tar.gz
      • After extracting into the root directory, execute the following compile command, you can bind the snappy library to the local Hadoop library, so that you can run on all the machines
        mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=(hadoop-snappy里面编译后的库地址) -Dbundle.snappy
        In the middle will report some anomalies, do not care, if the report abnormal exit, continue to execute the above command, until successful, the general speed will be related to your Internet connection, about 40 minutes, and finally compile successfully.
  • "Final Choice" to install Hadoop (download compiled Hadoop files directly without modifying Hadoop)

    • Install (download the compiled hadoop-2.6.0.tar.gz)
      "Download URL" http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/
    • Unzip the installation in/usr/local/
    • Renamed to Hadoop, the final path: the same /usr/local/hadoop .
Single-Machine deployment Hadoop (pseudo-distributed)
    • The configuration file for Hadoop is located /usr/local/hadoop/etc/hadoop/ in (a lot of XML files), and pseudo-distributed needs to modify 2 configuration files Core-site.xml and Hdfs-site.xml.
    • Modify configuration file Core-site.xml (vim/usr/local/hadoop/etc/hadoop/core-site.xml)
      will be among the
<configuration></configuration>

Modify to the following configuration:

<configuration>    <property >        <name>Hadoop.tmp.dir</name>        <value>File:/usr/local/hadoop/tmp</value>        <description>Abase for other temporary directories.</Description>    </Property >    <property >        <name>Fs.defaultfs</name>        <value>hdfs://localhost:9000</value>    </Property ></configuration>
    • Modify configuration file Hdfs-site.xml (same as)
<configuration><property >    <name>Dfs.replication</name>    <value>1</value></Property ><property >    <name>Dfs.namenode.name.dir</name>    <value>File:/usr/local/hadoop/tmp/dfs/name</value></Property ><property >    <name>Dfs.datanode.data.dir</name>    <value>File:/usr/local/hadoop/tmp/dfs/data</value></Property ></configuration>

"Note" does not mean that the starting position of the directory is:hadoop/

  • After the configuration is complete, perform the formatting of the Namenode:
    bin/hdfs namenode -format
    If successful, you will see successfully formatted a hint, and the bottom 5th line of the output message is prompted as follows, exitting with status 0 indicates success, and if exitting with status 1 is an error. If you make a mistake, try adding sudo sudo bin/hdfs namenode -format .
  • Turn on Namenode, Datanode daemon
    sbin/start-dfs.sh
    "Note" If you are using Hadoop 2.4.1 64-bit, there may be a series of warn prompts, such as WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable this hint, which can be ignored and will not affect normal use.
  • View process
    Input jps can
    Successful startup will list the following processes: NameNode, Datanode, and Secondarynamenode
  • Hadoop-webui
    Enter in browser http://localhost:50070 (localhost or server IP)
    Note: If you cannot access it, first check that the firewall is off (it should be off).
  • Note (encountered during configuration)
    In this step and the subsequent startup of Hadoop when prompted Error: JAVA_HOME is not set and could not be found. by the error, you need to set the Java_home variable in the file, that is, to hadoop/etc/hadoop/hadoop-env.sh find export JAVA_HOME=${JAVA_HOME} 这一行,改为 export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 (that is, the previously set java_home location), and then try again.
  • Close the Namenode, Datanode daemon
    sbin/stop-dfs.sh
Single-Machine deployment spark
    • Download
      wget http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
    • Unzip and change the name to spark
    • Environment variables
export SPARK_HOME=/usr/local/sparkexport PATH=$PATH:$SPARK_HOME/bin
    • Configuring the SPARK environment variable
      cd $SPARK_HOME/conf
      cp spark-env.sh.template spark-env.sh
      vim spark-env.sh
      Add the following code:
ExportJava_home=/usr/java/latestExportHadoop_home=/usr/local/hadoopExportHadoop_conf_dir=/usr/local/hadoop/etc/hadoopExportscala_home=/usr/local/scala-2.10.5ExportSpark_home=/usr/local/sparkExportspark_master_ip=127.0.0.1Exportspark_master_port=7077Exportspark_master_webui_port=8099Exportspark_worker_cores=3Exportspark_worker_instances=1Exportspark_worker_memory=TenGExportspark_worker_webui_port=8081Exportspark_executor_cores=1Exportspark_executor_memory=1G#export Spark_classpath=/opt/hadoop-lzo/current/hadoop-lzo.jar#export spark_classpath= $SPARK _classpath: $CLASSPATHExportLd_library_path=${ld_library_path}:$HADOOP _home/lib/native
    • Configure slave
      cp slaves.template slaves
      vim slaves
      Add the following code (default is localhost):
localhost
    • Because it is a single-machine SSH free login no longer repeat
    • Start Spark Master
      Directory: cd $SPARK_HOME/sbin/
      ./start-master.sh
    • Start Spark Slave
      Directory: cd $SPARK_HOME/sbin/
      ./start-slaves.sh(Note is slaves)
    • Start Spark-shell (application)
      ./spark-shell–master spark://127.0.0.1:7077
    • Spark-webui
      http://localhost:8099(localhost or server IP)
      As shown below:
    • Turn off master and slave
      Directory: cd $SPARK_HOME/sbin/
      ./stop-master.sh
      ./stop-slaves.sh

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Spark-1.4.0 Single-machine deployment (Hadoop-2.6.0 with pseudo-distributed) "Tested"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.