I. Distributed storage
1. Maintain the HDFs file system, which is the primary node of HDFs.
2. Receive client requests: Upload, download files, create directories, etc.
3. Log the client operation (edits file), save the latest state of HDFs
1) The edits file saves all operations against the HDFs file system since the last checkpoint, such as adding files, renaming files, deleting directories, etc.
2) Save directory: $HADOOP _home/tmp/dfs/name/current
You can use the HDFs oev-i command to output a log (binary) as an XML file
HDFs oev-i Edits_inprogress_0000000000000005499-o ~/temp/log.xml
4. Maintain file meta information and save infrequently used file meta information on your hard disk (fsimage file)
1) fsimage is a metadata checkpoint on the hard disk of the HDFs file system that records serialization information for all directories and files in the HDFs file system from the last checkpoint.
2) Save directory: edits
3) You can use the HDFs oev-i command to output the log (binary) as an XML file
1. Save the data in block units
1) data block size of Hadoop1.0:64M
2) Hadoop2.0 database size: 128M
2. In full distribution mode, at least two datanode nodes
3. Directory of Data Preservation: by Hadoop.tmp.dir parameter specifies
- secondary NameNode(second called node)
1. Main role: Merging logs
2. Timing of consolidation: when HDFs issues checkpoints
3. Log merge process:
1) Namenode single point of failure
Solution: Hadoop2.0 uses zookeeper to implement Namenode ha functionality
2) Namenode pressure is too high, and the memory is limited, which affects the system scalability
Solution: Hadoop2.0, using Namenode Federation for horizontal Scaling
Two. YARN: Distributed Computing (MapReduce)
- ResourceManager(Resource manager)
1. Receiving requests from clients, performing tasks
2. Allocating resources
3. Assigning tasks
- NodeManager(Node Manager: Running task MapReduce)
Get data from Datanode, perform tasks
Three. The architecture of HBase
Big Data Note (ii)--apache the architecture of Hadoop