1. There is a block on the blocks hard disk, which represents the smallest data unit that can be read and written, usually 512 bytes. A file system based on a single hard disk also has the concept of block. Generally, a group of blocks on the hard disk are combined into a block, which is usually several kb in size. These are transparent to users of the file system. Users only know that they have written a certain size of files to the hard disk or read a certain size of files from the hard disk. Of course, some maintenance commands, such as DF and fsck, are block-level operations.
HDFS also has blocks (blocks), but it is much larger than previously mentioned. The default value is 64 MB. The same as the block of a single hard disk file system, files on HDFS will be split into multiple block-sized fragmented storage. The difference is that if the file size is smaller than one block, the actual storage space on the hard disk is not as large as a block.
Why is HDFS blocks so big? This is to reduce the addressing time. It is not to say that the addressing speed is faster when the block size is large, but to reduce the proportion of the addressing time relative to the transmission time. Because HDFS is network-based, data will be transmitted between machines. After finding the address of the block, the data will be read or written to the machine. Therefore, the larger the block, the longer the transmission time, the addressing time is relatively small. If the block is set to a small value, the addressing time proportion is higher and the addressing frequency is more.
Block abstraction brings many benefits to HDFS. First of all, it is also the most obvious that files can be larger than the hard disk space of any computer on the network. Because all blocks of a file are not required to exist on one computer. Even a file block can be distributed across every computer in the cluster. Secondly, the block abstraction makes the storage subsystem simpler. Simplicity is the pursuit of all systems, especially Distributed Systems with hundreds of failures. Because the block size is fixed, the computer that manages metadata can easily calculate how many disks a hard disk can store, and does not have to worry about the storage of file metadata, because the block only stores data, the object metadata (such as permissions) is stored on an independent computer and operated independently. In addition, Block Storage facilitates fault tolerance. To ensure that data will not be lost when any storage node fails, data is generally backed up by block. Generally, blocks on one machine are backed up on the other two machines, that is, there are three copies in total. If the data of a block cannot be read, it can be read from another machine, which is transparent to HDFS users. If a data block is unavailable, its content will be read from the backup machine and copied to another machine to ensure that the backup data is restored to the set value. Like Linux commands, HDFS also has the fsck command % hadoop fsck/-files-blocks in hadoop 2. x, this command is not recommended, but uses % HDFS fsck/-files-blocks to list the blocks of each file in the HDFS system.
2. namenodes and datanodeshdfs clusters have two node types: namenodes and datanodes, which work in master-worker mode. One namenode is the master, and one set of datanodes is the workers. Namenode manages the namespace of the entire file system. It maintains the metadata of a file tree and all files/directories. The information is saved into two files on the local disk of the namenode. One is the namespace image and the other is the edit log ). Namenode also knows which datanode exists in each block of each file, but this information will not be persisted, because this information will be regenerated each time it is started. The client accesses the HDFS File System by accessing namenode and datanodes. However, the user code does not know the existence of namenode and datanodes at all, just like accessing POSIX (Portable Operating System Interface: Portable computer system interface ). Datanodes is hard-working. When the client or namenode command is run, it stores and retrieves file blocks (blocks ). It also regularly reports the list of file blocks they store to namenode. Without namenode, the entire file system cannot be used. If namenode is removed, all files in the file system will be lost, because there is no way to re-assemble the file blocks in each datanodes. Therefore, it is necessary to ensure that namenode is reliable enough. hadoop provides two mechanisms to ensure data security in namenode. The first mechanism is to back up persistent information on namenode. Hadoop can be configured to allow namenode to write persistent information to multiple places, and these write operations are serial and atomic. The common practice is to write a copy to a local disk and a copy to a remote NFS. Another mechanism is to configure a secondary namenode. Although the name is namenode, secondary namenode does not work at all. It periodically merges the namespace image and the edit log on the namenode into itself, to avoid excessive logs. Secondary namenode is usually a separate machine, because the merge operation requires a lot of CPU and memory resources. Because its status is later than namenode, data may be lost when a namenode accident occurs. The common practice is to copy the metadata file of namenode on NFS to the secondary namenode, and then start the secondary namenode To Make It namenode.
3. HDFS federationnamenode stores the file and block information in the HDFS File System in the memory, which limits the memory resources when the cluster grows. Therefore, in hadoop 2. X introduces the HDFS Federation concept, allowing more namenodes nodes to expand the HDFS scale. Each namenode manages a namespace, for example, a namenode manages all files under/usr, another namenode manages all files under/share. This/usr or/share is called namespace volume. Namespace volumes are independent of each other and do not communicate with each other. One namenode is dead, and other namenode are unknown. It does not affect access to files managed by other namenode. When accessing a federated HDFS cluster, the client uses a mount table stored on the client to map the correspondence between the file path and namenodes. Use viewfilesystem and viewfs: // Uris for configuration.
4. before HDFS high availability (HA: high-availability. namenodes and datanodes) mentioned by backing up persistent information on namenode, or by using secondary namenode, only data is not lost, but ha (high availability) cannot be provided ), namenode is still "spof: single point of failure ). If namenode is dead, all clients cannot access the files in the HDFS file system, read/write, or list the files. The new namenode must perform the following three tasks to provide services again: I) load the namespace image II) Redo the operations in the edit log (III) it takes 30 minutes to receive the file block information reports from all datanodes and exit the security mode. A pair of namenodes in active-standby mode is introduced in hadoop 2.x. When a disaster occurs, the standby machine replaces the active machine and becomes the new namenode. To implement such a structure, a new architecture is required:-a shared storage space is required between two namenodes to share the edit log ). Early implementation requires a highly available NFS, and more options are available in later versions. For example, you can use the zookeeper solution. -In addition, because the ing relationship between file blocks exists in the memory rather than on the disk, datanodes must report its storage status to both namenodes. -The client must be configured to automatically handle namenode failures, which is transparent to users.
When active namenode dies, standby namenode takes a very short time to replace, that is, dozens of seconds. Because standby namenode has the latest file block ing information and the latest edit log, everything is always ready. However, it takes a longer time (about one minute) to determine whether active namenode is actually dead ). What if both active namenode and standby namenode are dead? It doesn't matter. It just takes 30 minutes to start it cold. It's no worse than no active-standby. Therefore, active-standby is an optimization of the change, without any side effects.
The method of Heartbeat requests is used to determine whether the active node is still alive. However, the Administrator also has another tool that can gracefully switch to standby while the active status is still alive, making standby a new active status, which is useful for regular maintenance. This type of switching is called "graceful failover" because two namenodes switch roles sequentially, one being active and the other being standby. In the not elegant switchover, the active node is dead and must be switched to standby. In this case, the original active node is not necessarily dead. It may be due to a slow network or some reasons, as a result, it will be able to provide services again later. The most troublesome thing is that it does not know that it was dead. It will be troublesome if it comes out of service again. Therefore, there are a series of actions that will prevent it from joining the system again, such as killing the process, closing the port or something. The last move is stonith (violent header: shoot the other node in the head ).
All these are transparent to the client. When configuring namenode, the client maps a hostname to two IP addresses, and then tries to use the two IP addresses.
Hadoop HDFS (2) HDFS Concept