Hadoop learning notes-installation in full distribution mode
Steps for installing hadoop in fully distributed mode
Hadoop mode Introduction
Standalone mode: easy to install, with almost no configuration required, but only for debugging purposes
Pseudo-distribution mode: starts five processes, including namenode
Architecture
Hadoop has many elements. The bottom is Hadoop Distributed File System (HDFS), which stores files on all storage nodes in the Hadoop cluster. The previous layer of HDFS (for this article) is the MapReduce engine, which consists of JobTrackers and TaskTrackers.
HDFS
For external clients, HDFS is like a traditional hie
Tasktracker? Overview:Hadoop is a software framework that enables distributed processing of large amounts of data, implements Google's MapReduce programming model and framework, splits applications into small units of work, and places them on any cluster node. In MapReduce, an application ready to commit execution is called a job, while a unit of work that is divided from one job and runs on each compute node is called a task. In addition, the Distributed File System (HDFS) provided by
Configure HDFSConfiguring HDFS is not difficult. First, configure the HDFS configuration file and then perform the format operation on the namenode.
Configure Cluster
Here, we assume that you have downloaded a version of hadoop and decompressed it.
Conf in the hadoop installation directory is the directory where hadoop
Introduction HDFs is not good at storing small files, because each file at least one block, each block of metadata will occupy memory in the Namenode node, if there are such a large number of small files, they will eat the Namenode node's large amount of memory. Hadoop archives can effectively handle these issues, he can archive multiple files into a file, archiv
Hadoop is a distributed filesystem (Hadoop distributedfile system) HDFS. Hadoop is a large amount of data that can beDistributed Processingof theSoftwareFramework. Hadoop processes data in a reliable, efficient, and scalable way. Hadoop is reliable because it assumes that
Build a Hadoop Client-that is, access Hadoop from hosts outside the Cluster
Build a Hadoop Client-that is, access Hadoop from hosts outside the Cluster
1. Add host ing (the same as namenode ing ):
Add the last line
[Root @ localhost ~] # Su-root
[Root @ localhost ~] # Vi/etc
The contents of this article or reproduced from--Chao Wu meditation, or quite admire Chao Wu teacher O (∩_∩) o~The following describes the roles played by Namenode and Datanode:(1) NameNodeThe function of Namenode is to manage the file directory structure and to manage the data node. Namenode maintains two sets of data: One is the relationship between the file di
the Nutch, and from nutch0.8.0, the NDFs and MapReduce that were implemented in it were stripped out to create a new open source project, which was Hadoop, and the nutch0.8.0 version was more than the previous nutch in the architecture The fundamental change is that it is entirely built on the basis of Hadoop. Google's GFS and MapReduce algorithms are implemented in Ha
configuration to turn on different patterns.
Standalone mode
For distributed
Fully distributed
Here we are going to configure the pseudo-distributed to use, a single-node pseudo-distributed representation of each Hadoop daemon running alone in a Java process. 1. Edit the configuration file Etc/hadoop/core-site.xml,Etc/hadoop/hdfs-site.xml2.
to turn on different patterns.
Standalone mode
For distributed
Fully distributed
Here we are going to configure the pseudo-distributed to use, a single-node pseudo-distributed representation of each Hadoop daemon running alone in a Java process.1. Edit the configuration file Etc/hadoop/core-site.xml, Etc/hadoop/hdfs-site.xml2. Set up a no-k
The installation of this article only covers Hadoop-common, Hadoop-hdfs, Hadoop-mapreduce, and Hadoop-yarn, and does not include hbase, Hive, and pig.http://blog.csdn.net/aquester/article/details/246210051. planning 1.1. list of machines
NameNode
Second
As a matter of fact, you can easily configure the distributed framework runtime environment by referring to the hadoop official documentation. However, you can write a little more here, and pay attention to some details, in fact, these details will be explored for a long time. Hadoop can run on a single machine, or you can configure a cluster to run on a single machine. To run on a single machine, you only
Hadoop Core Project: HDFS (Hadoop Distributed File System distributed filesystem), MapReduce (Parallel computing framework)The master-slave structure of the HDFS architecture: The primary node, which has only one namenode, is responsible for receiving user action requests, maintaining the directory structure of the file system, managing the relationship between t
Read files
For more information about the file reading mechanism, see:
The client calls the open () method of the filesystem object (corresponding to the HDFS file system, and calls the distributedfilesystem object) to open the file (that is, the first step in the figure ), distributedfilesystem uses Remote Procedure Call to call namenode to obtain the location of the first several blocks of the file (step 2 ). For each block,
HDFS perspective, it is divided into NameNode and DataNode (in Distributed File Systems, target management is very important, directory management is equivalent to the master, while NameNode is the Directory Manager). Third, from the MapReduce perspective, the host is divided into JobTracker and TaskTracker (a job is often divided into multiple tasks, from this perspective, it is not difficult to understan
. These commands can only be used by the HDSF administrator. The following is an example of some actions/commands:
Action
Command
Place clusters in Security Mode
Bin/hadoop dfsadmin-safemode enter
Display the Datanode list
Bin/hadoop dfsadmin-report
Decommission Datanode node datanodename
Bin/hadoop dfsadmin-decommi
Two cyanEmail: [Email protected] Weibo: HTTP://WEIBO.COM/XTFGGEFWould like to install a single-node environment is good, and then after the installation of the total feel not enough fun, so today continue to study, to a fully distributed cluster installation. The software used is the same as the previous one-node installation of Hadoop, as follows:
Ubuntu 14.10-Bit Server Edition
Hadoop2.6.0
JDK 1.7.0_71
Ssh
Rsync
Prepare
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.