HDFs system architecture Diagram level analysis
Hadoop Distributed File System (HDFS): Distributed File systems
* Distributed applications mainly from the schema: Master node Namenode (one) from the node: Datenode (multiple)
*HDFS Service Components: Namenode,datanode,secondarynamenode
*HDFS storage: Files stored on HDFs are stored as blocks, and the default block size in the hadoop2.x version is 128M.
HDFs Service function
*namenode: Master node, storage file metadata (filename, file directory structure, file attributes (build time, number of replicas, file permissions, etc.)
, and the block list of each file and the Datanode of the block.
*datanode: Stores file data in the local file system, as well as checksum of block data (length, creation time, CRC32 checksum).
*secondary Namenode: A secondary daemon used to monitor the HDFS state to obtain a snapshot of the HDFs metadata at intervals. Periodically merges Namenode's mirrored and edited log files into a single file called a new mirrored file.
*************************************************************
There are two basic components in HDFs
is made up of ' one Namenode server and multiple Datanode servers '
A) Namenode: Name node that is used to store meta data (help you quickly find blocks, that is, data blocks are mapped)
HDFs Storage: (Block operation) is a file logically divided into blocks, Mody a block the default storage is 128M, the blocks are stored in sequence to a different Datanode server, each datanode stored on which blocks are recorded on the Namenode, Read the time is also the first to access Namenode, (Namenode on behalf of each group of access) through the Namenode to find the corresponding block file, Namenode is actually stored in the corresponding relationship meta data.
B) Datanode: Data nodes, used to store real data.
Datanode is actually used to store data blocks.
Note: When we have a lot of servers to form a HDFS cluster, we will start a role that is namenode and datanode process, as Namenode server to start the corresponding role
When a Datanode server failed to start, he does not affect the data access inside, such as all the Datanode server memory full, we can directly add the server in, can be arbitrarily added, not affect, this is the Distributed file storage System
We have a lot of servers in the cluster,
One of the servers started with Namenode.
Other servers are starting with Datanode.
Functions of the Yarn service
ResourceManager: Processing client requests
Start/Monitor Applicationmaster
Monitoring NodeManager
Resource allocation and scheduling
Applicationmaster: Data Segmentation
Request resources for an application and assign it to an internal task
Task monitoring and fault tolerance
NodeManager: Resource management on a single node
To process commands from ResourceManager
To process commands from Applicationmaster
Container: The abstraction of multitasking environment, encapsulates the CPU, memory and other multidimensional resources, as well as environment variables, launch commands and other tasks related to the operation of information
MapReduce how to run on the yarn
1. The client submits the task to the ResourceManager.
2. ResourceManager assigns a applicationsmanager to the NodeManager and finds a container to generate a Mr App MSTR.
3. The Application Manager requests resources from the ResourceManager.
4. After the resource application is completed, find NodeManager to launch the Mr App Mstr in the container.
5. Map task and reduce task start.
6. The map and reduce procedures are to be submitted to the Mr App Mstr for information.
7. When the program is finished, the Application Manager submits the information to the ResourceManager.