?? At present, there is only one machine, the first to practice the hand (no software installed on the server) try Spark's stand-alone deployment.
?? Several parameters:
?? jdk-1.7+
?? Hadoop-2.6.0 (pseudo-distributed);
?? Scala-2.10.5;
?? Spark-1.4.0;
?? Here are the specific configuration procedures
-
Install jdk 1.7+
Download URL http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
-
Environment variable settings (preferably not with OPENJDK):
export java_home=/usr/java/java-1.7 . 0 _71export jre_home= $JAVA _home /jreexport path= $PATH : $JAVA _home /binexport classpath=.: $JAVA _home /lib/dt.jar: $JAVA _ HOME /lib/tools.jar
-
Update restart environment variables
$ source/etc/profile
- test br>
$ java-version
Download and install scala-2.10.5
"Download URL" http://www.scala-lang.org/download/2.10.5.html
Download the corresponding compression package for the pair
$ tar -zxf scala-2.10.5.tgz$ sudo mv scala-2.10.5 /usr/local
To configure environment variables:
export SCALA_HOME=/usr/local/scala-2.11.4
export PATH=$SCALA_HOME/bin:$PATH
Update Startup environment variables
source /etc/profile
Test the success of a Scala installation
$ scala -version
"pro-Test" installation of Hadoop (manual compilation if Hadoop is required)
"Install a reference URL for Hadoop" http://qindongliang.iteye.com/blog/2222145
- Installation dependencies
sudo yum install -y autoconf automake libtool git gcc gcc-c++ make cmake openssl-devel,ncurses-devel bzip2-devel
-
Install maven3.0+
Download URL http://archive.apache.org/dist/maven/maven-3/3.0.5/binaries/
- unzip
TAR-XVF apache-maven-3.0.5-bin.tar.gz
- Move File
mv-rf apache-maven-3.0.5/usr/local/
-
Configure environment variables
< Code class= "Hljs bash" >maven_home=/usr/local/apache-maven-3.0 . 5 export maven_homeexport path=${path} : ${maven_home} /bin
-
Make effective
source/etc/profile
- Check for successful installation
mvn-v
-
Install ant1.8+
Download URL http://archive.apache.org/dist/ant/binaries/
Installing protobuf-2.5.0.tar.gz
- Install snappy1.1.0.tar.gz (optional option, this step is required if you need to compile the Hadoop support snappy compression)
- Installation
./configure --prefix=/usr/local/snappy
#指定的一个安装目录
make
make install
- Install hadoop-snappy
- git
git clone https://github.com/electrum/hadoop-snappy.git
li> packaging
After download is complete
CD hadoop-snappy
Execute maven Package command
mvn package-dsnappy.prefix=/home/search/ Snappy
- validation
? This directory is the compiled snappy local library, in
hadoop-snappy/target/ Hadoop-snappy-0.0.1-snapshot-tar/hadoop-snappy-0.0.1-snapshot/lib
Directory, there is a Hadoop-snappy-0.0.1-snapshot.jar
, after HADOOP is compiled, it needs to be copied to the $HADOOP _home/lib
directory.
? Remarks the package used throughout the process is placed under/root/.
- Installing Hadoop
- Install (download hadoop-2.6.0-src.tar.gz is the source of Hadoop)
"Download URL" http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/
"Can also be obtained directly"wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.1-src.tar.gz
- Extract
tar -zxvf hadoop-2.6.0-cdh5.4.1-src.tar.gz
- After extracting into the root directory, execute the following compile command, you can bind the snappy library to the local Hadoop library, so that you can run on all the machines
mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=(hadoop-snappy里面编译后的库地址) -Dbundle.snappy
In the middle will report some anomalies, do not care, if the report abnormal exit, continue to execute the above command, until successful, the general speed will be related to your Internet connection, about 40 minutes, and finally compile successfully.
"Final Choice" to install Hadoop (download compiled Hadoop files directly without modifying Hadoop)
- Install (download the compiled hadoop-2.6.0.tar.gz)
"Download URL" http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/
- Unzip the installation in
/usr/local/
- Renamed to Hadoop, the final path: the same
/usr/local/hadoop
.
Single-Machine deployment Hadoop (pseudo-distributed)
- The configuration file for Hadoop is located
/usr/local/hadoop/etc/hadoop/
in (a lot of XML files), and pseudo-distributed needs to modify 2 configuration files Core-site.xml and Hdfs-site.xml.
- Modify configuration file Core-site.xml (vim/usr/local/hadoop/etc/hadoop/core-site.xml)
will be among the
<configuration></configuration>
Modify to the following configuration:
<configuration> <property > <name>Hadoop.tmp.dir</name> <value>File:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</Description> </Property > <property > <name>Fs.defaultfs</name> <value>hdfs://localhost:9000</value> </Property ></configuration>
- Modify configuration file Hdfs-site.xml (same as)
<configuration><property > <name>Dfs.replication</name> <value>1</value></Property ><property > <name>Dfs.namenode.name.dir</name> <value>File:/usr/local/hadoop/tmp/dfs/name</value></Property ><property > <name>Dfs.datanode.data.dir</name> <value>File:/usr/local/hadoop/tmp/dfs/data</value></Property ></configuration>
"Note" does not mean that the starting position of the directory is:hadoop/
- After the configuration is complete, perform the formatting of the Namenode:
bin/hdfs namenode -format
If successful, you will see successfully formatted
a hint, and the bottom 5th line of the output message is prompted as follows, exitting with status 0 indicates success, and if exitting with status 1 is an error. If you make a mistake, try adding sudo sudo bin/hdfs namenode -format
.
- Turn on Namenode, Datanode daemon
sbin/start-dfs.sh
"Note" If you are using Hadoop 2.4.1 64-bit, there may be a series of warn prompts, such as WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
this hint, which can be ignored and will not affect normal use.
- View process
Input jps
can
Successful startup will list the following processes: NameNode, Datanode, and Secondarynamenode
- Hadoop-webui
Enter in browser http://localhost:50070
(localhost or server IP)
Note: If you cannot access it, first check that the firewall is off (it should be off).
- Note (encountered during configuration)
In this step and the subsequent startup of Hadoop when prompted Error: JAVA_HOME is not set and could not be found.
by the error, you need to set the Java_home variable in the file, that is, to hadoop/etc/hadoop/hadoop-env.sh
find export JAVA_HOME=${JAVA_HOME} 这一行,改为 export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
(that is, the previously set java_home location), and then try again.
- Close the Namenode, Datanode daemon
sbin/stop-dfs.sh
Single-Machine deployment spark
- Download
wget http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
- Unzip and change the name to spark
- Environment variables
export SPARK_HOME=/usr/local/sparkexport PATH=$PATH:$SPARK_HOME/bin
- Configuring the SPARK environment variable
cd $SPARK_HOME/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
Add the following code:
ExportJava_home=/usr/java/latestExportHadoop_home=/usr/local/hadoopExportHadoop_conf_dir=/usr/local/hadoop/etc/hadoopExportscala_home=/usr/local/scala-2.10.5ExportSpark_home=/usr/local/sparkExportspark_master_ip=127.0.0.1Exportspark_master_port=7077Exportspark_master_webui_port=8099Exportspark_worker_cores=3Exportspark_worker_instances=1Exportspark_worker_memory=TenGExportspark_worker_webui_port=8081Exportspark_executor_cores=1Exportspark_executor_memory=1G#export Spark_classpath=/opt/hadoop-lzo/current/hadoop-lzo.jar#export spark_classpath= $SPARK _classpath: $CLASSPATHExportLd_library_path=${ld_library_path}:$HADOOP _home/lib/native
- Configure slave
cp slaves.template slaves
vim slaves
Add the following code (default is localhost):
localhost
- Because it is a single-machine SSH free login no longer repeat
- Start Spark Master
Directory: cd $SPARK_HOME/sbin/
./start-master.sh
- Start Spark Slave
Directory: cd $SPARK_HOME/sbin/
./start-slaves.sh
(Note is slaves)
- Start Spark-shell (application)
./spark-shell–master spark://127.0.0.1:7077
- Spark-webui
http://localhost:8099
(localhost or server IP)
As shown below:
- Turn off master and slave
Directory: cd $SPARK_HOME/sbin/
./stop-master.sh
./stop-slaves.sh
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Spark-1.4.0 Single-machine deployment (Hadoop-2.6.0 with pseudo-distributed) "Tested"