- Hadoop was inspired by Google, and was originally designed to address the high and slow cost of data processing in traditional databases.
- Hadoop two core projects are HDFS(Hadoop Distributed File System) and MapReduce.
- HDFs is used to store data, which is different from the traditional relational database, and does not require strong data integrity to store large files in streaming data access mode . When the size of a dataset exceeds the storage capacity of a single physical machine is. It is necessary to partition it and store it on several separate computers. A file system that is stored across multiple computers in a management network is called a distributed file system. the concept of a block is applied to HDFs, and the files on it are divided into chunks of block size as separate storage units, the size of each disk default block is 512 bytes, and the HDFs block defaults to 64MB, which is more than the disk block to minimize addressing overhead. Blocks can be used as storage units to store files larger than any disk size, while simplifying the design of the storage subsystem. storing the same blocks on multiple machines ensures that the block data on the other machine can be executed when one block is damaged.
- There are two types of nodes on the HDFs cluster. One is Namenode and the other is Datanode. Namenode played the role of manager, managing the entire file System namespace, no Namenode, the file system will not be used. Once the Namenode machine is damaged, all the files will be lost, so some contingency measures need to be taken to prevent the irreversible effects of this situation. the first is to back up the files that make up the persistent state of the file system metadata, persisting the Namenode on multiple file systems. The second type is to run an auxiliary namenode.
- Not to be continued
Hadoop Learning Record (i) HDFS