We know that the Hadoop cluster is fault-tolerant, distributed and so on, why it has these characteristics, the following is one of the principles.
Distributed clusters typically contain a very large number of machines, and due to the limitations of the rack slots and switch ports, the larger distributed clusters typically span several racks, and the machines on multiple racks form a distributed
Setting up Hadoop cluster environment steps under Ubuntu 12.04I. Preparation before setting up the environment:My native Ubuntu 12.04 32bit as Maser, is the same machine that was used in the stand-alone version of the Hadoop environment, http://www.linuxidc.com/Linux/2013-01/78112.htmAlso in the KVM Virtual 4 machines, respectively named:Son-1 (Ubuntu 12.04 32bit
Environment Building-hadoop cluster building
Before writing, we quickly set up the centos cluster environment. Next, we will start building hadoop clusters.
Lab EnvironmentHadoop version: CDH 5.7.0Here, I would like to say that we have not selected the official version because the CDH version has already solved the dep
*
/public void init (jobconf conf) throws IOException {
setconf (conf);
cluster = new cluster (conf);
Clientugi = Usergroupinformation.getcurrentuser ();
}
This is still the jobclient of the MR1 era, in/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.5.0.jar
And/usr/lib/
take effect:
source /etc/profile2.3.3 check the current JDK version
java -version2.3.4 supplement (optional)
If you check that the current JDK version is not the JDK version you just set, you can set the default JDK version:
Sudo update-alternatives -- install/usr/bin/java/usr/lib/java/jdk1.6.0 _ 25/bin/java 300 sudo update-alternatives -- install/usr/bin/javac javac/usr/lib/java/jdk1.6.0 _ 25/bin/javac
variable.Note: If the file you downloaded is in RPM format, you can install it by using the following command:RPM-IVH jdk-7u72-linux-x64.rpm4.5. environment variable settings Modify the. Profile file (this is recommended so that other programs can also use the JDK in a friendly way)# Vi/etc/profileLocate the export PATH USER LOGNAME MAIL HOSTNAME histsize INPUTRC in the file, and change to the following form:Export java_home=/opt/java/jdk1.7.0_72Expo
and need to work with active NN and standby NN report block information; Advantages: Information is not lost, recovery fast (seconds) Disadvantage: Facebook based on Hadoop0.2 development, the deployment of a little trouble; additional machine resources are required, and NFS becomes another single point (but with a low failure rate) of 4. Hadoop2.0 directly supports standby NN, draws on Facebook's avatar, and then makes some improvements: information is not lost, recovery is fast (seconds), sim
Purpose
This article describes how to install, configure, and manage a meaningful hadoop cluster that can scale from a small cluster of several nodes to a large cluster of thousands of nodes.
If you want to install
Today the Hadoop authoritative Guide Weather Data sample code runs through the Hadoop cluster and records it.
Before the Baidu/google how also did not find how to map-reduce way to run in the cluster every step of the specific description, after a painful headless fly-style groping, success, a good mood ...
1 Preparin
NodeManagerWe will not go into the above-mentioned methods to see the use of start-yarn.sh simple Start-up method:To perform JPS on master:Indicates that the ResourceManager is operating normally.Perform JPS on both slave, and you will see NodeManager running normally, such as:Test Hadoop Test HDFsThe final test is to see if the Hadoop cluster is performing prop
The "three-step" process of the Hadoop pseudo-distribution environmentFirst, JDK installation and environment variable configuration1, test first, whether the JDK is installedJava-version2. View the number of CentOS positionsFile/bin/ls3. Switch to usr/, create java/directoryCD/LsCD usr/mkdir JavaCD java/Ls4, upload local download good, show upload command is not installedRz5, download RZ, sz commandYum-y Install
ServiceHive-server-Hive Management ServiceHive-metastore-Hive metadata, used for type check and syntax analysis of metadata
The specifications defined in this Article avoid confusion in understanding the configuration of multiple servers:All of the following operations must be performed on the host where Hive is located, that is, hadoop-secondary.
1. Preparations before installationHadoop cluster (CDH4) Pr
Whether you are adding machines and removing machines in a Hadoop cluster, there is no downtime and the entire service is uninterrupted.
Before this operation, the cluster of Hadoop is as follows:
The machine condition for HDFs is as follows:
The machine condition of Mr is as follows:
Adding Machines
In the master mac
Course Outline and Content introduction:About 35 minutes per lesson, no less than 40 lecturesThe first chapter (11 speak)• Distributed and traditional stand-alone mode· Hadoop background and how it works· Analysis of the working principle of MapReduce• Analysis of the second generation Mr--yarn principle· Cloudera Manager 4.1.2 Installation· Cloudera Hadoop 4.1.2 Installation· CM under the
ArticleDirectory
Insecure
Secure Mode
No downtime is required for adding or deleting machines in the hadoop cluster, and the entire service is not interrupted.
Before this operation, the hadoop cluster is as follows:
HDFS machines are as follows:
The MR machine is as follows:
Add Machine
-t rsa
Copy the public key to each machine, including the local machine, so that ssh localhost password-free login:
[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1[hadoop@master ~]$ ssh-co
Introduction to Hadoop
Hadoop is an open source distributed computing platform owned by the Apache Software Foundation. With Hadoop Distributed File System (Hdfs,hadoop distributed filesystem) and MapReduce (Google MapReduce's Open source implementation) provides the user with a distributed infrastructure that is trans
Apache Hadoop2.2.0, as the next-generation hadoop version, breaks through the limit of up to 4000 machines in the original hadoop1.x cluster, and effectively solves the frequently encountered OOM (memory overflow) problem, its innovative computing framework, YARN, is called the hadoop operating system. It is not only compatible with the original mapreduce computi
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.