Hadoop (i): deep analysis of HDFs principles

Source: Internet
Author: User

Transferred from: http://www.cnblogs.com/tgzhu/p/5788634.html

When configuring an HBase cluster to hook HDFs to another mirror disk, there are a number of confusing places to study again, combined with previous data; The three cornerstones of big Data's bottom-up technology originated in three papers by Google in 2006, GFS, Map-reduce, and Bigtable, in which GFS, Map-reduce technology directly supported the birth of the Apache Hadoop project, BigTable spawned a new NoSQL database domain, and with the Map-reduce processing framework's high-latency flaws, Google's 2009-year-old Dremel prompted the rise of real-time computing systems, triggering the second wave of big data technology, Some big data companies have launched their own Big data query analysis products, such as:Cloudera Open source Big Data query Analysis engine Impala,hortonworks Open source Stinger,fackbook Open source The Presto, UC Berkeley Amplab Lab has developed the Spark Computing framework, all of which are based on HDSF, and the most basic of all is its read and write operations .

Directory:

    • HDFs noun explanation
    • HDSF Architecture
    • NameNode (NN)
    • Secondary NN
    • HDFs Write file
    • HDFs Read File
    • Block persistence structure

HDFs noun Explanation:

    • block: In HDFs, Each file is stored in a chunked way, each block is placed on a different datanode, each block's identity is a ternary group (block ID, numbytes,generationstamp"), where the block ID is unique, the specific allocation is set by the Namenode node, and then the block file is created by the Datanode, and the corresponding block Meta file
    • Packet:
    • Chunk: Chinese names can also be called blocks, but in order to differentiate from block, it is called Chunk. In the process of communication between Dfsclient and Datanode, the file is carried out in a block-based manner, but in the process of sending the data in packet way, Each packet contains multiple chunk, and checksum calculations for each chunk, generating checksum bytes
    • summary:
      1. a text Block is split into multiple blocks persistent storage (block size determined by profile parameter)    think: Modify block size What is the impact on previously persisted data?
      2. A block is split into multiple packet during data communication
      3. A packet contains multiple chunk span>
    • Packet structure and definition: packet is divided into two categories, one is the actual packet, the other is the Heatbeat package. The composition of a packet packet,
    • , a packet is composed of header and data, where the header section contains a packet profile property information, as shown in the following table:
    • The data section is a real-world part of the packet, consisting primarily of a 4-byte checksum (Checksum) and a chunk section, with a chunk portion up to 512 bytes
    • In the process of building a packet, the byte stream data is first written to a buffer buffers, that is, from the position of offset 25 (Checksumstart) to start writing packet data chunk checksum part, Start writing the chunk data portion of the packet in the position of offset 533 (datastart) until a packet creation is complete.
    • When writing the last packet of the last block of a file, if the size of a packet fails to reach the maximum length, which is the corresponding buffer, a buffer position is not written between checksum and chunk data. Before sending this packet, it checks whether the buffer between chunksum and chunk data is a blank buffer (GAP), and if so, moves the chunk data section forward, allowing chunk data 1 to be adjacent to Chunk Checksum N. Before it is sent to the Datanode node

HDSF Architecture:

  • The structure of the HDFS network a bunch of, grabbed a more clear diagram as follows, mainly includes the role of the class: Client, NameNode, Secondaynamenode, DataNode
  • HDFs Client: System consumer, invoke HDFs API action file, get file metadata interactively with nn , data read and write with DN, Note: file segmentation is done by Client when writing data
  • The Namenode:master node (also known as the metadata node) is the only manager of the system. Responsible for managing metadata (namespace and block mapping information); configuring replica policies; Processing client requests
  • Datanode: The Data storage node (also called the slave node), stores the actual data, performs the reading and writing of the data block, and reports the storage information to the NN
  • Secondary NameNode: The role of younger brother, share the workload of eldest brother NameNode; is a cold backup of NameNode; merge Fsimage and fsedits and then send NameNode, Note: in Hadoop 2.x Version, this role will not be available when HDFs ha is enabled. (see Second order)
  • Explanatory notes:
      1. Hot backup:B is a hot backup, if a is broken off. So b run the job instead of a right now.
      2. Cold backup:B is a cold backup of a, if a is broken off. Then B can't replace a job immediately. But B stores some information on a, reducing the loss after a is broken
    • HDFS Architecture Principles:
      1. separation of metadata from data: The attributes of the file itself (that is, the metadata) are separated from the data held by the file
      2. master/Slave Architecture: an HDFS cluster consists of a namenode and a certain number of datanode
      3. write multiple reads at once: Files in HDFs can only have one writer at any time. When the file is created, then the data is written, and finally, once the file is closed, it can no longer be modified.
      4. Mobile Computing is more cost-effective than moving data: data operations, closer to the data, the better the performance of the operation, because the HDFS data distributed on different machines, to minimize the network consumption and improve the system's throughput, the best way is to move the execution of the operation closer to the data it wants to process, Instead of moving the data

NameNode:

    • Namenode is the management node of the entire file system and the most complex entity in HDFs, maintaining the two most important relationships in the HDFs file system:
      1. The file directory tree in the HDFs file system, and the data Block index of the file, which is the list of data blocks for each file
      2. The correspondence between the data block and the data node, that is, the information of which data nodes are persisted in a block of data
  • The first relationship, the index information for the directory tree, metadata, and data blocks is persisted to the physical store, and the implementation is stored in the mirror fsimage and edit log edits of the namespace , Note: in Fsimage, does not record the corresponding table information for each block corresponding to which datanodes
  • The second relationship is that after the Namenode is started, each Datanode scans the local disk and reports the block information stored on the Datanode to Namenode,namenode after receiving each Datanode's chunk information report. Store the received block information, and the Datanode information it contains, in memory. HDFs is the way that this block of information is reported to complete the block-to-datanodes list of the corresponding table construction
  • Fsimage records the serialization information for all directories and files in the HDFs file system before the last checkpoint;
  • Edits is the metadata operations log (records all HDFS operations between each save Fsimage and the next save)
  • When the Namenode starts, the file system metadata information in the fsimage is loaded into memory, then the metadata in memory is synchronized to the latest state based on the records in the Eidts, and the new version of Fsimage is saved from memory to the local disk. Then delete the old Editlog, this process is called a checkpoint (checkpoint), how long do checkpoint? ( see chapter Fourth parameter Configuration ) can checkpoint be triggered manually? Verify that Editlog has not been removed after restarting the HDFs service?
  • Similar to the checkpoint in the database, in order to avoid edits log too large, in hadoop1.x, Secondarynamenode will be based on the time threshold (such as 24 hours) or edits size threshold (such as 1G), periodic fsimage and edits merge, Then push the latest fsimage to Namenode. And in hadoop2.x, this action is done by standby Namenode .
  • As can be seen, these two files once damaged or lost, will cause the entire HDFs file system is not available, in the HDP2.4 installation (v): Cluster and component installation cluster installation process, HDFS default only one NN, does it mean that nn exists single point? ( See second single HDFs HA)
  • In Hadoop1. X in order to ensure high availability of both metadata files, the general practice is to set Dfs.namenode.name.dir to a comma-delimited list of directories, at least not on a single disk, preferably on a different machine, such as: mount a shared file system
  • Fsimage\edits is a serialized file that you want to view or edit the contents of, available through HDFs with the Oiv\oev command, as follows:
      • Command: hdfs oiv (Offline image Viewer) is used to dump the contents of a fsimage file into a specified file for readability, such as a text file, an XML file, which requires the following parameters:
        1. -I (required parameter) –inputfile <arg> input Fsimage file
        2. -O (required parameter) –outputfile <arg> output The converted file, if present, overwrites the
        3. -p (optional parameter) –processor <arg> convert fsimage files to which format: (ls| xml| filedistribution). The default is LS
        4. Example:hdfs oiv-i/data1/hadoop/dfs/name/current/fsimage_0000000000019372521-o/home/hadoop/fsimage.txt
      • Command: An abbreviation for the HDFs Oev (offline edits viewer offline edits viewer), which operates only on files and therefore does not require a Hadoop cluster to be running.
        1. Example: hdfs oev-i edits_0000000000000042778-0000000000000042779-o edits.xml
        2. Supported output formats are binary (Hadoop used in binary format),XML (default output format when parameter p is not used), and stats (statistics for output edits files)
  • Summary:
    1. Namenode manages the Datanode, receives Datanode's registration, heartbeat, data block submission and other information, and sends the data block copy, delete, restores and so on in the heartbeat, at the same time, Namenode also for the client to the file system tree operation and the file data reads and writes, Provides support for the management of HDFS systems
    2. Namenode will enter a special state called Safe Mode when it is started . A Namenode in Safe mode does not replicate data blocks. The Namenode receives heartbeat signals and block status reports from all Datanode. The Block status report includes a list of all data blocks for a Datanode. Each data block has a specified minimum number of copies. When the Namenode detects that the number of copies of a block of data reaches this minimum, the block is considered to be a copy-safe (safely replicated), and a block of data that is configurable in a certain percentage (this parameter) is Namenode After the detection is confirmed to be secure (plus an additional 30 second wait time), Namenode exits the Safe mode state. Next, it determines which data blocks have not reached the specified number of copies, and copies the blocks to other Datanode.

Secondary NameNode: also known as standby node in HA cluster

    • Periodically merge Fsimage and edits logs to control the size of the edits log file at one limit
    • Namenode responds to secondary Namenode request, pushes edit log to secondary Namenode and begins to re-write a new edit log
    • Secondary Namenode received fsimage file and edit log from Namenode
    • Secondary Namenode loads fsimage into memory, applies edit log, and generates a new Fsimage file
    • Secondary Namenode pushes the new fsimage to Namenode
    • Namenode replaces the old fsimage with the new fsimage, noting in the Fstime file that the checkpoint occurred

HDFs Write File:

    • Write file section Reference blog address (http://www.cnblogs.com/laov/p/3434917.html), 2.X version of the default block size is 128M (see Chapter Fourth parameter configuration)
    1. The client will Filea by 64M. Divided into two pieces, block1 and Block2;
    2. Client sends write data request to Namenode, blue dashed ①------>
    3. Namenode node, log block information. And return the available Datanode (Namenode by what rules return Datanode? See third single Hadoop rack-aware), such as Pink dashed ②--------->
      • Block1:host2,host1,host3
      • Block2:host7,host8,host4
    4. The client sends Block1 to the Datanode, and the sending process is streamed, with the following flow-through writing process:
      1. Divide the 64M block1 by 64k packet
      2. Then send the first packet to Host2
      3. Host2 after receiving the first packet sent to Host1, while the client wants to host2 send a second packet
      4. Host1 received the first packet, sent to HOST3, while receiving HOST2 sent the second packet
      5. And so on, the red line is shown, until the BLOCK1 is sent
      6. HOST2,HOST1,HOST3 sends a notification to NAMENODE,HOST2 to the client, saying "the message has been sent out." The pink color is shown in the solid line
      7. After receiving the message from HOST2, the client sent a message to Namenode that I had finished writing. This is really done. Yellow thick solid line
      8. After sending the BLOCK1, send the Block2 to Host7,host8,host4, as shown in the blue solid line
    • Description
      1. When the client writes data to the HDFS file, it is initially written to the local temporary file. Assuming that the copy factor of the file is set to 3, when the local temporary file accumulates to the size of a data block, the client obtains a Datanode list from Namenode to hold the copy. The client then begins transmitting data to the first Datanode, the first Datanode a small fraction (4 KB) of the data, writes each part to the local repository, and transmits the part to the second Datanode node in the list at the same time. The second Datanode is also the case, where a small fraction of the data is received, written to the local repository, and passed on to a third Datanode. Finally, a third Datanode receives the data and stores it locally. Therefore, Datanode can be pipelined to receive data from the previous node, and at the same time forward to the next node, the data in a pipelined way from the previous Datanode copy to the next
      2. The timing diagram is as follows:

    • Summary:
      1. Write the process, press HDSF default settings,1T file, we need 3T of storage,3T of network traffic
      2. In the process of reading or writing, Namenode and Datanode Save the communication through heartbeat, making sure the datanode is alive. If Datanode is found dead, the data on the dead Datanode will be dropped to the other nodes. Read, to read other nodes to go
      3. Hang up a node, it doesn't matter, there are other nodes that can be backed up; it doesn't matter if you hang up a rack. Other racks, also backup

HDFs Read file:

    • Read the file as follows:
    • The client opens the file that it wants to read by invoking the Open () method of the FileSystem object, which, for HDFs, distributes an instance of the file system;
    • Distributedfilesystem calls Namenode to determine the location of the file's starting block by using RPC, the same block returns multiple locations according to the number of repetitions, sorted by the Hadoop cluster topology, near the front of the client ( See chapter III )
    • The first two steps return a Fsdatainputstream object, which is encapsulated as a Dfsinputstream object, Dfsinputstream can easily manage datanode and namenode data streams, and the client calls read on this input stream. () method
    • Dfsinputstream, which stores the Datanode address of the file's starting block, connects to the nearest datanode, transferring data from Datanode to the client by repeatedly invoking the read () method on the data stream
    • When the end of the block is reached, Dfsinputstream closes the connection to the Datanode and then looks for the best datanode of the next block, which is transparent to the client and the client's point of view is simply to read a continuous stream
    • Once the client finishes reading, the close () method is called on Fsdatainputstream to shut down the file read

Block Persistence structure:

    • The Datanode node last block persisted to the physical storage structure on disk, as shown in:
    • Each block file (such as the blk_1084013198 file) corresponds to a meta file (such as the Blk_1084013198_10273532.meta file), The block file is a chunk binary data (the size of each chunk is 512 bytes), and the meta file is the checksum data corresponding to each chunk, which is the serialized form of the stored

Hadoop (i): deep analysis of HDFs principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.