How to install Spark & Tensorflowonspark

Source: Internet
Author: User


Right, you have not read wrong, this is my one-stop service, I in the pit pits countless after finally successfully built a spark and tensorflowonspark operating environment, and successfully run the sample program (presumably is the handwriting recognition training and identification bar). 

installing Java and Hadoop



Here is a good tutorial, is also useful, and good-looking tutorial.
http://www.powerxing.com/install-hadoop/
Following this tutorial, basically there are not too many pits to complete the installation. But there are some points to be aware of. When Spark sends a command from master, it looks like it is looking for a file by path, so you must configure all the computers on the cluster to have the same user name, for example, I call Ubuntu, and the text is called Hadoop, here you have to notice, If you do not follow the tutorial also known as Hadoop, note that there are some commands or paths you can not directly copy, but to change the inside of the Hadoop into your user name, such as in the pseudo-distributed configuration Core-site.xml when the path, you have to change to your actual path can be. Is that after following the tutorial configuration Java_home will also appear can not find java_home, you should modify the contents of Java_home in hadoop/etc/hadoop/hadoop-env.sh is the absolute path.
2. The other SSH password-free login can not use this, because to build a real cluster, so this need not, set up and then tell you. If you don't set it up, you might get an error when you start Hadoop. This is a strange scenario: Live nodes has only one (there should be two), and every time you refresh it is different. Such a solution is to modify the Hdfs-site.xml, talk about data.dir change to be different. Cluster Manager If it is a novice, it is recommended to use the standalone mode directly (if you do not know what is the cluster manager, it is more recommended [smile]), that is, Spark's own cluster manager, which means that the above tutorial "start yarn" You can totally skip this section.



And then there's no pit, Bulabula. You have a pseudo-distributed hadoop that you have, and you've taken a big step toward spark.



The above is a pseudo-distributed building, if you want to really distributed (distributed is the meaning of more than one computer), you need to look at this tutorial
https://my.oschina.net/jackieyeah/blog/657750
Follow this tutorial to completely modify a wave, and every computer to make the same changes, and there is a small pit (also pit me a night), if the top of the tutorial in the pseudo-distributed, core-site.xml This file of this property
To change from localhost:9000 to master:9000.


         <property>
             <name>fs.defaultFS</name>
             <value>hdfs://master:9000</value>
        </property>
install Scala and spark


Set up the top of our next tutorial [smile]
1. Installing Scala is a good choice for a 2.10.X, so it's better to support Spark, and there won't be a few moths. Here's a tutorial, it should be possible http://www.runoob.com/scala/scala-install.html
2. Installing spark is probably the easiest thing to do here. Click here to download Spark. Since we've installed Hadoop, we're going to download a spark package that doesn't require Hadoop, which is the with user-provided Hadoop



I use 1.6.0 as if we use this more, the latest has been to 2.1.x.
Unzip to the directory you want to install


sudo tar-zxf ~/download/spark-1.6.0-bin-without-hadoop.tgz-c/usr/local/
cd/usr/local
sudo mv. spark-1.6.0-bin-without-hadoop/./spark
sudo chown-r hadoop:hadoop./spark          # Here's Hadoop for your username


After a very important step is to modify the contents of the spark-env.sh, as if to change a lot of ...


Cd/usr/local/spark
CP./conf/spark-env.sh.template./conf/spark-env.sh
Vim conf/spark-enf.sh


Here are some of the configurations of my spark-env.sh


Export Hadoop_home=/home/ubuntu/workspace/hadoop
export hadoop_conf_dir= $HADOOP _home/etc/hadoop

Export Hadoop_hdfs_home=/home/ubuntu/workspace/hadoop

Export spark_dist_classpath=$ (/home/ubuntu/workspace/hadoop/ Bin/hadoop classpath)

export java_home=/home/ubuntu/workspace/jdk/
export scala_home=/home/ubuntu/ Workspace/scala Export

spark_master_ip=192.168.1.129 export
spark_worker_memory=1g

export Spark_ master_port=7077 Export
Spark_worker_cores=1 export
spark_worder_instances=2

export spark_executor_ instances=2


Attributes do not understand can open spark-env.sh, inside the front of a lot of comments, speak the meaning of various attributes. (PS: Here is a spark_dist_classpath must follow the correct, otherwise it will not run up)
This is a great tutorial for the Force star to write very well.
http://www.powerxing.com/spark-quick-start-guide/
3. Distributed Spark Deployment
The point is, of course, the tutorial here
https://my.oschina.net/jackieyeah/blog/659741
There seems to be no hole here, but it seems I remember the first time the workers on other machines always start not up, but forget what is the reason, may be the password-free login is not set or how.
After the tutorial is complete, you get the spark cluster spicy, sprinkle flower ~ (≧▽≦)/~



PS: Here is a brief introduction to building a standalone cluster, Spark's standalone mode installation deployment 

installation Tensorflowonspark



This is really hard to say, simple is really big simple, because the steps on GitHub has been written well, but some pits, it will really put people to death.
Yahoo's Open source Tensorflowonspark
1. What. You said that all English can not understand, well I can not understand, but you want to install Tensorflowonspark, you should pull to the bottom of the wiki site

2. As we said earlier, we are using our own standalone Cluster manager, so just point

3. After entering the tutorial, the first step is to copy and paste, if not git, follow the prompts to install git;
The second step is not required because you have installed spark; The third step in the instruction, or you will find that you do not open (I will not open ...) ), you can choose here: TensorFlow Chinese web. One of the little pits to note is that at the end of the third step there is a test


Python ${tfos_home}/tensorflow/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py--data_dir ${TFoS_HOME }/mnist


If you download the version of tensorflow1.x, there may be problems running out, so I downloaded the 0.12.1 version, what, you ask me how to next, is the tutorial of this sentence


$ pip Install HTTPS://STORAGE.GOOGLEAPIS.COM/TENSORFLOW/LINUX/CPU/TENSORFLOW-0.5.0-CP27-NONE-LINUX_X86_64.WHL


In the 0.5.0 changed into 0.12.1 Wood has a problem.
There is the Mnist data set may not be linked to the URL for some reason not to download (I am also jiangzi ... ), you can download http://download.csdn.net/detail/fishseeker/9779536 to my csdn.
4. The fourth step is to start spark, in fact, the direct start on the line, the bottom of those messy settings can be configured on their own.
5. Fifth step, put me into a dog ... Need to change as below, and this cv.py is actually changed the file input path, as if it was in HDFs, I changed to local path, oh, right, here output is output to HDFs, so be sure to open hdfs ah, otherwise, GG.


${spark_home}/bin/spark-submit \
--master spark://master:7077 \
${tfos_home}/examples/mnist/cv.py \
-- Output examples/mnist/csv \
--format CSV


The change of cv.py is to change the parameters of the method that calls Writemnist in the mnist_data_setup.py line 132,133. See figure below for details

After the change.

This code runs after going to the 50070 port of HDFs to view your files on it.

6. The sixth step is train, which is to use the data of the transformation just to train the model, here also need to change something


${spark_home}/bin/spark-submit \
--master spark://master:7077 \
--py-files ${tfos_home}/tfspark.zip,${tfos _home}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=4 \
--conf spark.task.cpus=2 \
-- Conf spark.executorenv.java_home= "$JAVA _home" \
${tfos_home}/examples/mnist/spark/mnist_spark.py \
-- Cluster_size 2 \
--images examples/mnist/csv/train/images \
--labels examples/mnist/csv/train/labels \
- -format csv \
--mode train \
--model Mnist_model


Note that there are several workers this cluster_size must be set to at least a few, the number of SPARK.TASK.CPU is greater than the number of worker. It's all set up, and it looks like it's running, but it's stuck. At this time need to enter the port 8080 to see the worker's strerr, I have an error here said there is no set hadoop_hdfs_home, this need to export in spark-env.sh, set to the same as the hadoop_home on the line. This has had to move, need to change the mnist_spark.py of the 109th Line, the Logdir=logdir–>logdir=none (another here 119 lines also change, otherwise the next recognition will also be stuck)
7. The seventh step is to change some numbers (mostly I didn't set them) and then change the code.


${spark_home}/bin/spark-submit \
--master spark://master:7077 \
--py-files ${tfos_home}/tfspark.zip,${tfos _home}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=4 \
--conf spark.task.cpus=2 \
-- Conf spark.executorenv.java_home= "$JAVA _home" \
${tfos_home}/examples/mnist/spark/mnist_spark.py \
-- Cluster_size 2 \
--images examples/mnist/csv/test/images \
--labels examples/mnist/csv/test/labels \
-- Mode inference \
--format csv \
--model mnist_model \
--output Predictions


And you're happy to find that your HDFs has a recognizable result.
Just like this (obsessive-compulsive cutting the picture below)

And then you open up and you see the same wonderful results as on the website.



==============2017.4.15 Update ==================
Today, when running the program, there is a puzzling bug, the main symptom is that the task will be stuck somewhere no longer, point in to see the detailed task will find that a certain task on a slave is stuck, debugging no results. Shutdown restart after the run no longer lag, the reason is probably the system requirements of the resources did not meet, later encountered this situation, it is recommended to re-adjust the number of CPUs required to run and the amount of memory and other resources configuration options to try again, it is not possible to restart the shutdown, generally can be resolved



==============2017.7.28 Update ====================
Stepping on a hole again is, or may be stuck in the identification, perhaps because the other worker's Namenode forgot format.
There is also a very low rate of recognition, possibly because Python cannot find the jar package, using the following methods:
Originally python when writing HDFs file, cannot find the corresponding jar package, at the time of submission, add the following configuration information


--conf spark.executorenv.ld_library_path= "${java_home}/jre/lib/amd64/server" \


Here in the comment area "she said the cherry blossoms at the end of the alley were open." 2017-07-13 10:10 [Reply] "provide a workaround
RELATED links:



Use idea to view and modify spark source code
http://blog.csdn.net/fishseeker/article/details/63741265



Modify spark source code and compile the deployment
http://blog.csdn.net/fishseeker/article/details/68957206


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.