enter SSH hadoop02
Configuring JDK
Here in/home loyalty create three folders
tools--Store Kits
softwares--Storage Software
data--Storing data
Upload the downloaded Linux jdk to Hadoop01 's/home/tools via WINSCP
Extract JDK into softwares
The JDK home directory is visible in/home/softwares/jdk.x.x.x, the copy of the directory is pasted into the/etc/profile file and set in the file Java_home
Export java_home=/home/softwares/jdk0_111
Save changes, p
the configuration to take effect.Configuring the cluster/Distributed environmentThe cluster/Distributed mode needs to modify the 5 profiles in the/usr/local/hadoop/etc/hadoop, and more settings can be clicked to view the official instructions, which only set the necessary settings for normal startup: Slaves, Core-site.xml, Hdfs-site.xml, Mapred-site.xml, Yarn-site.xml.1, file slaves, will be written as the
$ sudo cp README.txt input
3. Run the WordCount program, and save the output in a print folder
#每次重新执行wordcount程序的时候, you need to delete the output folder first. Otherwise there will be an error .
$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output
4. View character Statist
Purpose
This article describes how to install, configure, and manage a meaningful hadoop cluster that can scale from a small cluster of several nodes to a large cluster of thousands of nodes.
If you want to install Hadoop on a single machine, you can find the details here. Prerequisites Ensure that all required software is installed on each node in your cluster. Get the
the polling task can commit a file (commit)The Getmapcompletionevents:reduce task calls this method to get the map output file and return to the Map status update (map completion transaction)4. Datanodeprotocol (NN DN)Introduction to the Agreement:DN Register information to NN, send current dn,block and other information to NN (send block report, send block error report)NN return DN action required (delete block or copy)Main methods:Register: Registering DN to nnSENDHEARTBEAT:DN report NN, pre
Three hadoop modes:Local Mode: local simulation, without using a Distributed File SystemPseudo-distributed mode: five processes are started on one host.Fully Distributed mode: at least three nodes, JobTracker and NameNode are on the same host, secondaryNameNode is a host, DataNode and Tasktracker are a host.Test environment:
CentOS2.6.32-358. el6.x86 _ 64
Jdk-7u21-linux-x64.rpm
the Hadoop file system to archive files. Hadoop archive files are primarily used to reduce namenode memory usage .
KFS
Kfs
Fs.kfs.KosmosFileSystem
Cloudstore (formerly known as the Kosmos file system) file system is a GFS file system similar to HDFs and Google, written in C + +.
Ftp
Ftp
Fs.ftp.FtpFileSystem
the Hadoop file system to archive files. Hadoop archive files are primarily used to reduce namenode memory usage .
KFS
Kfs
Fs.kfs.KosmosFileSystem
Cloudstore (formerly known as the Kosmos file system) file system is a GFS file system similar to HDFs and Google, written in C + +.
Ftp
Ftp
Fs.ftp.FtpFileSystem
, we have seen the program output result, which is correct. Therefore, this proves that the map-Reduce function is normal.
The above shows how to view file data through the HDFS File System of hadoop. This is natural, but if you want to view the file data on HDFS in hadoop from the perspective of the Linux File System, what is it like? For example:
Because data is stored in datanode in the hdfs file syste
/local/java/jdk1.8.0_121(2) Core-site.xml
(3) Hdfs-site.xml
There are three copies of the data
(4) Mapred-site.xml (requires user to create a new file, according to Mapred-site.xml.default settings can be)
(5) yarn-env.sh
Add Java_home Configuration
Export java_home=/usr/local/java/jdk1.8.0_121(6) Yarn-site.xml
(7) Slaves
CDH1
CDH2CDH (master) is also used as NameNode as DataNode.
Make the same configuration on CDH1 and CDH2
scp/home/
the DFS Name node shocould store the name table. if this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
DFS. data. dir
/usr/hadoop/filesystem/Data
determines where on the local filesystem an DFS data Node shoshould store its blocks. if this is a comma-delimited list of directories, then data will be stored
Using HDFS to store small files is not economical, because each file is stored in a block, and the metadata of each block is stored in the namenode memory. Therefore, a large number of small files, it will eat a lot of namenode memory. (Note: A small file occupies one block, but the size of this block is not a set value. For example, each block is set to 128 MB, but a 1 MB file exists in a block, the actual
Document directory
Format namenode
Solution 1:
Solution 2:
View Original
Note: Switch the version from 0.21.0 to 0.20.205.0 or vice versa. There is no way to use the built-in upgrade command (many operations in this article are best written as scripts, which is too troublesome to manually operate)
Please indicate the source for reprinting. Thank you. It is really tiring to implement it.Before testing
The test uses three machines as the test:
in ~/.ssh/: Id_rsa and id_rsa.pub; These two pairs appear, similar to keys and locks.Append the id_rsa.pub to the authorization key (there is no Authorized_keys file at this moment)$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys(3) Verify that SSH is installed successfullyEnter SSH localhost. If the display of a native login succeeds, the installation is successful.3. Close the firewall $sudo UFW disableNote: This step is very important, if you do not close, there will be no problem finding D
Use yum source to install the CDH Hadoop Cluster
This document mainly records the process of using yum to install the CDH Hadoop cluster, including HDFS, Yarn, Hive, and HBase.This article uses the CDH5.4 version for installation, so the process below is for the CDH5.4 version.0. Environment Description
System Environment:
Operating System: CentOS 6.6
Hadoop v
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.