Learn more about Hadoop

Source: Internet
Author: User
Keywords Dfs nbsp; name dfs nbsp; name
Tags *.h file .net access application applications applied archive bandwidth
-----------------------20080827-------------------


insight into Hadoop


http://www.blogjava.net/killme2008/archive/2008/06/05/206043.html


One, premise and design goal


1, hardware errors are normal, not exceptional, HDFs may be composed of hundreds of thousands of servers, any one component may have been invalidated, so error detection and rapid, automatic recovery is HDFS core architectural objectives.


2, the application of running in HDFs is different from the general application, they mainly flow-type read-oriented, do batch processing; The problem of low latency of data access is more critical than the high throughput of data access.


3, HDFs to support large data sets as the target, a typical file size stored on the average in gigabytes to T bytes, a single HDFs instance should be able to support tens of millions of files.


4, HDFS application to file requirements is the Write-one-read-many access model. A file is created, written, and closed and does not need to be changed. This assumption simplifies data consistency and makes high throughput data access possible. A typical mapreduce framework, or a web crawler application, is ideal for this model.


5, the cost of mobile computing is lower than the cost of moving data. The computation of an application request, the closer the data is to its operation, the more efficient it is, especially when the data reaches a massive level. Moving the computation near the data is obviously better than moving the data to the application, and HDFs provides an interface to the application.


6, portability between heterogeneous software and hardware platforms.





II, Namenode and Datanode


HDFs adopts Master/slave architecture. A HDFs cluster is composed of a namenode and a certain number of datanode. Namenode is a central server responsible for managing file System namespace and client access to files. Datanode is typically a node in the cluster, responsible for managing the storage that comes with them on the node. Internally, a file is actually partitioned into one or more blocks, which are stored in the Datanode collection. Namenode performs namespace operations on file systems, such as opening, closing, renaming files and directories, and determining the mappings of blocks to specific datanode nodes. Datanode the creation, deletion and replication of blocks under the command of Namenode. Namenode and Datanode are designed to run on ordinary, inexpensive Linux-running machines. HDFs is developed in the Java language, so it can be deployed on a wide range of machines. A typical deployment scenario is when a machine runs a separate Namenode node, and the other machines in the cluster run a Datanode instance. This architecture does not preclude the running of multiple datanode on a single machine, but this is relatively rare.





single node Namenode greatly simplifies the architecture of the system. Namenode is responsible for keeping and managing all HDFs metadata so that user data does not need to be namenode (that is, file data is read and written directly on Datanode).





three, namespace
of file system

HDFs supports traditional hierarchical file organizations, similar to most other file systems, where users can create directories and create, delete, move, and rename files. HDFs does not support user quotas and access and links (link), although the current schema does not preclude the implementation of these features. Namenode maintains the namespace of the file system, and any modifications to the file System namespace and file attributes will be recorded Namenode. The application can set the number of copies of the file saved by HDFs, the number of copies of the file is called the replication factor of the file, and this information is also saved by Namenode.





iv. Data Replication


HDFs is designed to store massive volumes of files reliably across machines in a large cluster. It stores each file as a block sequence, except for the last block, all blocks are the same size. All blocks of a file are replicated for fault tolerance. The block size and replication factor for each file are configurable. The replication factor can be configured when a file is created and can be changed later. The files in HDFs are write-one and strict requirements are only one writer at any time. Namenode manages the replica of block, which periodically receives a heartbeat packet and a blockreport from each datanode in the cluster. The heartbeat packet reception indicates that the Datanode node is working properly, and Blockreport includes a list of all the blocks on the Datanode.


Block Replication


Namenode (Filename,numreplicas,block-ids,...)


/users/sameerp/data/part_0,r:2,{1,3},...


/users/sameerp/data/part_1,r:3,{2,4,5},...


1, the storage of replicas, the storage of replicas is the key to HDFs reliability and performance. HDFs uses a strategy called Rack-aware to improve data reliability, effectiveness, and utilization of network bandwidth. The short-term goal of this strategy is to validate performance in a production environment, observe its behavior, and build the basis for testing and research in order to achieve more advanced strategies. A large HDFs instance is typically run on a cluster of computer-formed machines, communication between two machines in different racks is required through the switch, and it is clear that the bandwidth between the two nodes in the same rack is usually greater than the bandwidth of the two machines between the different racks.


Through a process called rack awareness, Namenode determines the rack ID that each datanode belongs to. A simple but not optimized strategy is to store replicas on separate racks. This prevents the entire rack from being invalidated and can be read from multiple racks when reading data. This simple policy setting allows replicas to be distributed across the cluster, enabling load balancing in the event of component failure. However, this simple strategy increases the cost of writing because a write operation requires the transfer of blocks to multiple racks.


in most cases, the replication factor is 3,hdfs's storage strategy is to store a copy of a node on the local rack, one copy on another node on the same rack, and one node on a different rack. Rack errors are far less than node errors, and this strategy does not affect the reliability and effectiveness of the data. A one-third copy on one node, two-thirds on a rack, and others in the remaining racks, this strategy improves write performance.


2, the choice of replicas, in order to reduce the overall bandwidth consumption and read delay, HDFs will try to make reader read the most recent copy. If you have a copy on the same rack as reader, read the copy. If a HDFS cluster spans multiple data centers, reader will first attempt to read the copy of the data center.


3, SafeMode


Namenode is launched into a special state called SafeMode, where the namenode is not replicating the data block. Namenode receives heartbeat packs and Blockreport from all Datanode. Blockreport includes a list of all blocks of data for a datanode. Each block has a specified minimum number of replicas. When Namenode detects the minimum number of copies of a datanode block, the Datanode is considered safe, and if a certain percentage (this parameter is configurable) The block detection confirmation is secure, then Namenode exits the SafeMode state. It then determines which data blocks have not reached the specified number of replicas and copies them to other Datanode.





v. Persistence of file system metadata


Namenode stores HDFs metadata. For any operation that modifies file metadata, Namenode is logged with a transaction log called Editlog. For example, if you create a file in HDFs, Namenode inserts a record in the Editlog, and the replication factor that modifies the file also inserts a record into Editlog. Namenode stores this editlog on the local OS file system. The entire file system's namespace, including block to file mapping, file attributes, are stored in a file called Fsimage, which is also placed on the file system of the Namenode system.


Namenode Stores an image of the entire file system namespace and file Blockmap in memory. This critical metadata is designed to be compact, so a namenode with 4G of RAM is sufficient to support a large number of files and directories. When Namenode starts, it reads Editlog and fsimage from the hard disk, fsimage the transaction in all Editlog (apply) in memory, and fsimage the new version of flush from memory to the hard disk. Then truncate the old Editlog, because this old Editlog affair has already been done on fsimage. This process is called checkpoint. In the current implementation, checkpoint only occurs when the Namenode is started, and in the near future we will implement a checkpoint that supports periodicity.


Datanode does not know anything about the file except to save the data in the file on a local file system. It stores each HDFS data block in an isolated file on the local file system. Instead of creating all the files in the same directory, Datanode uses heuristics to determine the best number of files per directory, and to create subdirectories when appropriate. Creating all files in the same directory is not an optimal choice, because the local file system may not be able to efficiently support a large number of files in a single directory. When a datanode is started, it scans the local file system, produces a list of all the HDFs blocks of these local files, and then sends the report to Namenode, which is blockreport.





vi. Communication Protocol


all HDFS communication protocols are built on the TCP/IP protocol. The client connects to Namenode through a configurable port and interacts with Namenode through ClientProtocol. And Datanode is interacting with Namenode using Datanodeprotocol. A remote call (RPC) is abstracted from ClientProtocol and Datanodeprotocol, and in design, Namenode does not initiate RPC, but rather responds to RPC requests from clients and Datanode.





Seven, robust

The main goal of
HDFs is to achieve the reliability of data storage in case of failure. Common three kinds of failures: Namenode failures, Datanode failures and network segmentation (receptacle partitions).


1, hard drive data errors, heartbeat detection, and Replication


each datanode node sends a heartbeat packet periodically to namenode. Network cutting may cause some datanode to lose contact with Namenode. Namenode detects this by missing the heartbeat packet and marks the Datanode as dead and does not send new IO requests to them. Any data that exists on the dead Datanode will no longer be valid. Datanode's death may have caused some blocks of replicas to be lower than the specified value, Namenode keeps track of the blocks that need to be replicated, and initiates replication whenever necessary. You may need to replicate in the following situations: A Datanode node fails, a copy is corrupted, a datanode is on the hard disk, or the file's replication factor increases.


2, Cluster equilibrium


HDFs supports a balanced plan for data, if the free space on a datanode node falls below a specific critical point, a plan is initiated to automatically move the data from one datanode to the idle Datanode. When a request for a file suddenly increases, it is also possible to start a plan to create a new copy of the file and distribute it to the cluster to meet the requirements of the application. These balance plans have not yet been achieved.


3, data integrity


a block of data obtained from a datanode may be corrupted, possibly due to datanode storage device errors, network errors, or software bugs. HDFs client software realizes the checksum of HDFs file content. When a client creates a new HDFs file, it calculates the checksum for each block of the file and saves the checksum as a separate hidden file under the same HDFs namespace. When the client retrieves the contents of the file, it confirms that the data obtained from Datanode matches the checksum in the corresponding checksum file, and if not, the client can choose to obtain a copy of the block from another datanode.


4, metadata disk error


Fsimage and Editlog are the core data structures of HDFS. If these files are corrupted, the entire HDFs instance will fail. Thus, Namenode can be configured to support the maintenance of multiple copies of Fsimage and Editlog. Any modifications to fsimage or Editlog will be synchronized to their replicas. This synchronization may reduce the Namenode of namespace transactions that can support processing per second. This price is acceptable because HDFS is data intensive rather than metadata intensive. When Namenode reboots, it always chooses the most recent consistent fsimage and Editlog use.


Namenode in HDFs is a single point of existence, and if Namenode is in the wrong machine, manual intervention is necessary. At present, the function of restarting the service Namenode on another machine is not yet realized.


5, Snapshot


snapshots Support data copies of a certain time, and can revert to a known-good point in time when the HDFs data is corrupted. HDFs currently does not support snapshot functionality.





VIII. Data Organization


1, data block


Compatible HDFs are used to handle large data sets. These applications are written once, read once to many times, and the speed of reading to meet streaming reading. HDFs supports the write-once-read-many semantics of files. A typical block size is 64MB, so files are always split into chunk in 64M, each chunk stored in a different Datanode


2, Step


a client's request to create a file was not sent to Namenode immediately, in fact, the HDFS client caches the file data to a local temporary file. Applied writes are transparently redirected to this temporary file. When this temporary file accumulates more data than the size of a block (default 64M), the client will contact Namenode. Namenode inserts the file name into the hierarchy of the filesystem, assigns a block of data to it, and returns the Datanode identifier and the target block to the client. The client flush the local temporary file to the specified datanode. When the file is closed, the remaining flush data in the temporary file is transferred to the specified Datanode, and the client tells Namenode that the file is closed. At this point, Namenode commits the file creation operation to persistent storage. If Namenode hangs before the file is closed, the file will be lost.


The above method is a result of careful consideration of the target applied to the HDFs run. If the client-side caching is not adopted, the network speed and network congestion will have a greater impact on the swallowing estimate.


3, Pipeline replication


When a client writes data to a HDFs file, it begins by writing to the local temp file, assuming that the replication factor for the file is set to 3, then the client obtains a Datanode list from the Namenode to hold the copy. The client then begins to transmit data to the first Datanode, the first datanode a small fraction (4KB) to receive the data, writes each part to the local warehouse, and transmits the portion to the second Datanode node at the same time. The second datanode is the same, the edge of the collection, a small part of the collection, stored in the local warehouse, while passing to the third Datanode, the third datanode is just received and stored. This is the pipelined replication.





ix. Accessibility


HDFs provides a variety of ways to access the application, through the Dfsshell of the command line and HDFS data, can be invoked through the Java API, can also be accessed through the C Language Encapsulation API, and provide browser access. is developing a way to access through the WebDAV protocol. Use the reference documentation specifically.





10, Space recycling


1, file deletion and recovery


user or application to delete a file, this file is not immediately removed from the HDFs. Instead, HDFs renames the file and transfers it to the/trash directory. The file can be quickly restored when the file is still in the/trash directory. The time that the file is saved in/trash is configurable, and when this time is exceeded, Namenode deletes the file from the namespace. Deletion of the file, the data block associated with the file will also be freed. Note that there is a wait time delay between the file being deleted by the user and the increase in HDFs free space.


when the deleted file remains in the/trash directory, if the user wants to recover the file, they can retrieve the/trash directory and retrieve the file. The/trash directory only saves the most recent copy of the deleted file. The/trash directory is no different from other file directories except one: HDFs has applied a special policy on this directory to automatically delete files, and the current default policy is to delete files that are retained for more than 6 hours, a policy that will later be defined as a configurable interface.

Reduction of
2 and replication factor


when a file's replication factor is reduced, Namenode selects the excess copy to be deleted. The next heartbeat test passes that information to Datanode, Datanode removes the block and frees up space, and there is a time lag between the call to the Setreplication method and the increase in the free space in the cluster.





FS Shell





HDFS allows user data to is organized in the form of files and directories. IT provides a commandline interface called FS Shell that lets a user interact with the data in HDFS. The syntax of this command set is errors to other shells (e.g. bash, CSH), which users are already with. Here are some sample Action/command pairs:


HDFs allows users to manipulate files and directories. It provides a command interface called the FS Shell. Let a user and HDFs combine. The syntax of a command is similar to a shell (such as bash,csh) familiar to other users, and here is a comparison of some action commands


Action Command


Create a directory named/foodir bin/hadoop dfs-mkdir/foodir


Create a directory named/foodir bin/hadoop dfs-mkdir/foodir


View the contents of a file Named/foodir/myfile.txt bin/hadoop dfs-cat/foodir/myfile.txt





FS shell is targeted for applications so need a scripting language to interact with the stored data.

The
FS Shell goal is to provide applications that require scripting language access to store data.





Dfsadmin





The Dfsadmin command set is used for administering a HDFS cluster. These are commands this are used only by a HDFS administrator. Here are some sample Action/command pairs:


Action Command


put the "cluster in SafeMode bin/hadoop Dfsadmin-safemode enter


Generate A list of datanodes Bin/hadoop Dfsadmin-report


decommission DataNode datanodename bin/hadoop dfsadmin-decommission datanodename




The
dfsadmin command set is used to manage a HDFS cluster. Some commands are only used by a HDFS administrator. There are some control
of the action command.

put the "cluster in SafeMode bin/hadoop Dfsadmin-safemode enter


Generate A list of datanodes Bin/hadoop Dfsadmin-report


decommission DataNode datanodename bin/hadoop dfsadmin-decommission datanodename





Browser Interface





A typical HDFS install configures a Web server to expose the HDFS namespace a through TCP port. This is allows a user to navigate the HDFS namespace and view the contents of it files using a Web browser.


A typical HDFs installation configures a Web server to display the HDFs namespace through a configurable TCP port. This allows a user to browse the HDFs namespace and view the contents of the file through a Web browser








-------------------------20080828---------------------------


http://www.cppblog.com/javenstudio/archive/2008/02/22/43076.html





understands the workflow of Hadoop, that is, how each component coordinates work
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.