Hadoop is a distributed storage and computing platform for Big data
Architecture of HDFs: Master-Slave architecture
The primary node has only one namenode, and there can be many datanode from the node.
Namenode is responsible for:
(1) Receiving User action request
(2) Maintaining the directory structure of the file system
(3) Managing the relationship between the file and block, and the connection between block and Datanode
Datanode is responsible for:
(1) Storing files
(2) file is partitioned into blocks and stored on disk
(3) To ensure data security, the file will have multiple copies
Namenode and Datanode refer to different independent physical machines.
Analogy: Block Puzzle, Namenode is a manual, each block is datanode.
MapReduce Architecture: Master-Slave architecture
The primary node has only one jobtracker, and there can be many tasktracker from the node.
Jobtracker is responsible for:
(1) Receiving the calculation task submitted by the customer
(2) Assign the calculation task to Tasktracker execution
(3) Monitoring the implementation of Tasktracker
Tasktracer is responsible for:
(1) Perform calculation tasks for Jobtracer assignment
The physical cluster distribution of Hadoop:
Each of these nodes, whether primary or slave, is essentially a Java process.
Physical structure of a single node:
So the features of Hadoop:
(1) Distributed: Strong capacity, low cost, high efficiency
(2) Replica mechanism: High reliability
Hadoop Learning Notes (2) Hadoop framework parsing