Apache Hadoop2.2.0, as the next-generation hadoop version, breaks through the limit of up to 4000 machines in the original hadoop1.x cluster, and effectively solves the frequently encountered OOM (memory overflow) problem, its innovative computing framework, YARN, is called the hadoop operating system. It is not only compatible with the original mapreduce computi
original path to the target path Hadoop fs-cat/user/hadoop/a.txt View the contents of the A.txt file Hadoop fs-rm/user/hadoop/a.txt Delete US The A.txt file below the Hadoop folder under the ER folderHadoop fs-rm-r/user/hadoop/a.
file./hdfs/data--Storing data./hdfs/tmp--Storing temporary files 2.6 Modifying an XML configuration file The XML file that needs to be modified under hadoop2.9.1/etc/hadoop/There are 5 main files to modify:hadoop-env.shCore-site.xmlHdfs-site.xmlMapred-site.xmlYarn-site.xmlSlaves 2.6.1, vim hadoop-env.sh, fill in the Java installation path 2.6.2, vim core-site.xml,configuration tag insert t
Generally, one machine in the cluster is specified as namenode, and another machine is specified as jobtracker. These machines areMasters. The remaining Machines serve as datanodeAlsoAs tasktracker. These machines areSlaves
Official Address :(Http://hadoop.apache.org/common/docs/r0.19.2/cn/cluster_setup.html) 1 prerequisites
Make sure that all required software is installed on each node of your cluster
:
--Modify the Mapred-site.xml file
[gird@hotel01conf]# VI Mapred-site.xml
Common configuration parameters in Mapred-site.xml files
n Configuring Masters and slaves Files
[gird@hotel01conf]$ VI Masters
Hotel01.licz.com
[gird@hotel01conf]$ VI Slaves
Hotel02.licz.com
Hotel03.licz.com
7. Replicate Hadoop (awk command) to each node
--Copy the files of Hadoop configured on the hotel01.licz.com
settings for nodes in VirturalboxThe command for viewing the IP address of a node in Linux is the ifconfig inet address shown ( Note that CentoS installed on the virtual machine does not automatically connect to the network and needs to be connected to the Internet in the upper right corner to see the IP address):Linux View IP commandConfigure the machine nameStart by completing the preparation on the Master node and shutting down Hadoop ( /usr/local
As a matter of fact, you can easily configure the distributed framework runtime environment by referring to the hadoop official documentation. However, you can write a little more here, and pay attention to some details, in fact, these details will be explored for a long time. Hadoop can run on a single machine, or you can configure a cluster to run on a single m
-bit JDK version: JDK 1.7 Hadoop version: Hadoop 2.7.2
Cluster Environment:
role
hostname
IP
Master
Wlw
192.168.1.103
Slave
Zcq-pc
192.168.1.105
Create a Hadoop user
It is important to note that the Hadoop
Hadoop consists of two parts:
Distributed File System (HDFS)
Distributed Computing framework mapreduce
The Distributed File System (HDFS) is mainly used for the Distributed Storage of large-scale data, while mapreduce is built on the Distributed File System to perform distributed computing on the data stored in the distributed file system.
Describes the functions of nodes in detail.
Namenode:
1. There is only one namenode in the
Perform scala-version and the normal output indicates success.
3. Installing the Hadoop server
Host Name
IP Address
Jdk
User
Master
10.116.33.109
1.8.0_65
Root
Slave1
10.27.185.72
1.8.0_65
Root
Slave2
10.25.203.67
1.8.0_65
Root
Download address for Hadoop: http://hadoop.apache.org/
Configure the Hos
Hadoop Modes
Pre-install Setup
Creating a user
SSH Setup
Installing Java
Install Hadoop
Install in Standalone Mode
Lets do a test
Install in Pseudo distributed Mode
Hadoop
.
Modify core-site.xml
Modify hdfs-site.xml
Modify mapred-site.xml
7) modify the hadoop/conf/hadoop-evn.xml file, where the jdk path is specified.Export JAVA_HOME =/usr/local/jdk
8) Modify/hadoop/conf/masters and slaves to negotiate the Virtual Machine name to let hadoop know the host and datanode
file each time you change the configuration file. You can see it in the startup information at startup, all nodes synchronize the configuration file from the configured location.For example, my master node hostname is dellypc-master, and all hadoop configuration files are placed in/home/Delly/hadoop-0.20.2 ($ user will be recognized as the current user name, so
Apache Ambari is a Web-based tool that supports the supply, management, and monitoring of Apache Hadoop clusters. Ambari currently supports most Hadoop components, including HDFS, MapReduce, Hive, Pig, Hbase, Zookeper, Sqoop, and Hcatalog.Apache Ambari supports centralized management of HDFS, MapReduce, Hive, Pig, Hbase, Zookeper, Sqoop, and Hcatalog. It is also one of the five top-level
Today the Hadoop authoritative Guide Weather Data sample code runs through the Hadoop cluster and records it.
Before the Baidu/google how also did not find how to map-reduce way to run in the cluster every step of the specific description, after a painful headless fly-style groping, success, a good mood ...
1 Preparin
protected]:~$ ssh slave2Output:[Email protected]:~$ ssh slave1Welcome to Ubuntu 16.04.1 LTS (gnu/linux 4.4.0-31-generic x86_64)* documentation:https://help.ubuntu.com* management:https://landscape.canonical.com* Support:https://ubuntu.com/advantageLast Login:mon-03:30:36 from 192.168.19.1[Email protected]:~$2.3 Hadoop 2.7 Cluster deployment1, on the master machine, in the
number of servers to be built. However, we learned how to find such servers at home. We can find several PCs and install the linux system on the PC.
Of course, we still have a simpler way: Find a high-performance computer, install Virtual Machine Software on the computer, create several virtual machines in it, and then make these virtual machines form a small internal LAN, on this network, we can install linux software, java software, and
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.