Hadoop learning note_7_distributed File System HDFS -- datanode Architecture

Source: Internet
Author: User
Tags hadoop fs
Distributed File System HDFS-datanode Architecture

 

1. Overview

Datanode: provides storage services for real file data.

Block: the most basic storage unit [the concept of a Linux operating system]. For the file content, the length and size of a file is size. The file is divided and numbered according to the fixed size and order starting from the 0 offset of the file, each divided block is called a block.

Unlike the Linux operating system, a file smaller than the block size is uploaded, which occupies the space of the actual file size.

 

 

2. Enter hdfs-default.xml

<property>  <name>dfs.block.size</name>  <value>67108864</value>  <description>The default block size for new files.</description></property>

It is displayed that the default block size of HDFS is 64 MB (67108864b). If a 256/64 MB file is divided into = 4 blocks. namenode stores these blocks on different datanode. therefore, all blocks of a file are not necessarily placed on a datanode.

In HDFS, if a file is smaller than the size of a data block, it does not occupy the entire data block storage space.

 

 

3. Find the location where datanode stores blocks.

<Property> <Name> DFS. data. dir </Name> <value >$ {hadoop. TMP. dir}/dfs/Data </value> <description> determines where on the local filesystem an DFS data node shocould store its blocks. if this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. directories that do not exist are ignored. </description> </property>

Go to the/usr/local/hadoop/tmp/dfs/data/current directory.

 

XX. The meta file is used to verify the data.

Linux Command STAT/to view Linux File System block and other information

 

4. Verify the file size

 

1. Upload hadoop-xxx.tar.gz with a size of 61927560

 

As you can see, the file size is also 61927560 (the original file size)

 

2) then upload the jdk-xxx.bin and view

 

Two! Add two blocks = size of the original data block

 

Summary:

When HDFS datanode stores data, if the size of the original file is greater than 64 MB, it is split according to the size of 64 mb. If it is <64 MB, there is only one block, the occupied disk space is the actual size of the original file.

 

 

If you manually upload a file to the datanode directory, you cannot view the file information using hadoop FS-ls.

This will bypass namenode, and namenode will maintain the HDFS directory structure and know the namenode and data storage location information.

All file storage blocks are managed in namenode, which occupies a space in the memory. Therefore, the more blocks, the greater the pressure on namenode. for example, the storage of three small files of 2 K has no impact on the storage of datanode, because these files can be stored in a block, but the memory pressure on namenode increases. if it is a massive volume of small files, the pressure is amazing!

However, if the block size is too large, it is not good because the single point of reading and writing slows down and the re-transmission of errors is inconvenient.

The smaller the block division, the more pressure the namenode memory has.

Therefore, we need to divide the data according to the actual situation. Generally, 64 m, 128 M, and m are common.


Specific modification:

Copy related content from the hdfs-default.xml to the hdfs-site.xml and modify its numeric size.

 

5. Multiple copies of replication: The default value is three.

<property>  <name>dfs.replication</name>  <value>3</value>  <description>Default block replication.   The actual number of replications can be specified when the file is created.  The default is used if replication is not specified in create time.  </description></property>

 

5. view the HDFS directory structure in a browser

Browser address bar:

Http: // hadoop: 50070

 

Appendix:

Cluster: the basic unit for reading and writing data to and from a disk in a Windows file system. If the cluster is divided into 8 K, the file system reads and writes data to and from a file with 8 K as the basic unit. therefore, if the size of a file is 4 K, 8 K space will be occupied on the disk (a waste of resources)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.