Immutable infrastructure
How to better use container technology to achieve immutable infrastructure
Tachyon
Tachyon Introduction
PASA Big Data Laboratory of Nanjing University
Spark/tachyon: Memory-based distributed storage System
Spark on Yarn
Build spark on yarn cluster process--can be used with 3 to form a reference
Spark on Yarn
Spark on YARN cluster installation deployment-- recommended
1) problems and fixes when configuring Hadoop yarn clusters:
On each machine (master and individual slave), add (export) java_home this environment variable (depending on the java_home on a specific machine) to the end of the two files of hadoop-env.sh and yarn-env.sh.
In passing
cd ~/hadoop-2.7.1 #进入hadoop目录bin/hadoop namenode -format #格式化namenodesbin/start-dfs.sh #启动dfs sbin/start-yarn.sh #启动yarn
After, login http://master:8088, found that there is slave node is unhealthy status, and then configure, on each machine (master and each slave), modify the Yarn-site.xml file, add the following: ( not recommended!) )
Name=yarn.nodemanager.disk-health-checker.enable
Value=false
Then restart the cluster after stop-all.sh on master:
sbin/start-dfs.sh #启动dfs sbin/start-yarn.sh #启动yarn
You will find that you are back to normal.
2) when configuring Spark's spark-env.sh
Note that the value of Spark_local_dirs on master should be the same as on each slave, that is, Spark is placed under the same path for each machine.
3) at present, Hadoop creates Macintosh translated into Rehl 7.1 cannot run on SuSE.
4) do not add localhost to various slaves files
Hadoop compilation Error
I am compiling Hadoop in the IBM Java environment. Lists errors and workarounds during compilation for your reference.
1) Antrun
Failed to execute goal
Org.apache.maven.plugins:maven-antrun-plugin:1.6:run (Create-testdirs)
Http://stackoverflow.com/questions/17126213/building-hadoop-with-maven-failed-to-execute-goal-org-apache-maven-pluginsma
-Rparent-directory-R../-DskipTests
2) Build failed with JVM, IBM JAVA on Testsecurelogins
Package Com.sun.security.auth.module does not exist
https://issues.apache.org/jira/browse/HADOOP-11783
This is a patch that is designed to be played in the IBM Java environment.
3) after the above two fix if the build SUCCESS is displayed soon, and in (assuming the downloaded source folder is named hadoop-release-2.7.1) hadoop-release-2.7.1/hadoop-dist/ There is no tar package named hadoop-2.7.1.tar.gz in the target/directory, stating that it was not successfully compiled and returned to the root directory of hadoop-release-2.7.1 to continue execution:
-Pdist-DskipTests-Dtar
http://www.iteblog.com/archives/897
After that, the time for compiling is significantly longer, and you will spend it in this thrilling time:)
Yarn Cluster Running SPARKPI error
Exception in thread "main" Java.io.IOException:All datanodes
9.12.43.187:50010 is bad. Aborting ...
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.setuppipelineforappendorrecovery (dfsoutputstream.java:1206)
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.processdatanodeerror (dfsoutputstream.java:1004)
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.run (dfsoutputstream.java:548)
It is possible that a patch is needed for the problem on the size side of the IBM mainframe.
or through the combination of heterogeneous platform solution.
Run successfully display :
Kick off the big change (next): Distributed computing framework and big data