Kick off the big change (next): Distributed computing framework and big data

Source: Internet
Author: User

Immutable infrastructure

How to better use container technology to achieve immutable infrastructure

Tachyon

Tachyon Introduction

PASA Big Data Laboratory of Nanjing University

Spark/tachyon: Memory-based distributed storage System

Spark on Yarn
    1. Build spark on yarn cluster process--can be used with 3 to form a reference

    2. Spark on Yarn

    3. Spark on YARN cluster installation deployment-- recommended

1) problems and fixes when configuring Hadoop yarn clusters:

On each machine (master and individual slave), add (export) java_home this environment variable (depending on the java_home on a specific machine) to the end of the two files of hadoop-env.sh and yarn-env.sh.
In passing

cd ~/hadoop-2.7.1     #进入hadoop目录bin/hadoop namenode -format     #格式化namenodesbin/start-dfs.sh               #启动dfs sbin/start-yarn.sh              #启动yarn

After, login http://master:8088, found that there is slave node is unhealthy status, and then configure, on each machine (master and each slave), modify the Yarn-site.xml file, add the following: ( not recommended!) )

Name=yarn.nodemanager.disk-health-checker.enable
Value=false

Then restart the cluster after stop-all.sh on master:

sbin/start-dfs.sh               #启动dfs sbin/start-yarn.sh              #启动yarn

You will find that you are back to normal.

2) when configuring Spark's spark-env.sh

Note that the value of Spark_local_dirs on master should be the same as on each slave, that is, Spark is placed under the same path for each machine.

3) at present, Hadoop creates Macintosh translated into Rehl 7.1 cannot run on SuSE.

4) do not add localhost to various slaves files

Hadoop compilation Error

I am compiling Hadoop in the IBM Java environment. Lists errors and workarounds during compilation for your reference.

1) Antrun

Failed to execute goal
Org.apache.maven.plugins:maven-antrun-plugin:1.6:run (Create-testdirs)

Http://stackoverflow.com/questions/17126213/building-hadoop-with-maven-failed-to-execute-goal-org-apache-maven-pluginsma

-Rparent-directory-R../-DskipTests

2) Build failed with JVM, IBM JAVA on Testsecurelogins

Package Com.sun.security.auth.module does not exist

https://issues.apache.org/jira/browse/HADOOP-11783

This is a patch that is designed to be played in the IBM Java environment.

3) after the above two fix if the build SUCCESS is displayed soon, and in (assuming the downloaded source folder is named hadoop-release-2.7.1) hadoop-release-2.7.1/hadoop-dist/ There is no tar package named hadoop-2.7.1.tar.gz in the target/directory, stating that it was not successfully compiled and returned to the root directory of hadoop-release-2.7.1 to continue execution:

-Pdist-DskipTests-Dtar

http://www.iteblog.com/archives/897

After that, the time for compiling is significantly longer, and you will spend it in this thrilling time:)

Yarn Cluster Running SPARKPI error

Exception in thread "main" Java.io.IOException:All datanodes
9.12.43.187:50010 is bad. Aborting ...
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.setuppipelineforappendorrecovery (dfsoutputstream.java:1206)
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.processdatanodeerror (dfsoutputstream.java:1004)
At
Org.apache.hadoop.hdfs.DFSOutputStream Datastreamer.run (dfsoutputstream.java:548)

It is possible that a patch is needed for the problem on the size side of the IBM mainframe.
or through the combination of heterogeneous platform solution.

Run successfully display :

Kick off the big change (next): Distributed computing framework and big data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.