Hadoop system distributed storage and parallel computing architecture

Source: Internet
Author: User
Keywords Parallel computing data storage system points each distributed storage
Tags aliyun computing control data data storage distributed distributed file system distributed storage

Figure 1-14 shows the Hadoop system http://www.aliyun.com/zixun/aggregation/14305.html Distributed storage and parallel computing architecture from the hardware architecture point of view, Hadoop system is running in a normal The distributed storage and parallel computing system of commercial server cluster.The cluster will have a master node used to control and manage the normal operation of the entire cluster and coordinate the management of each slave nodes in the cluster to complete the data storage and computing tasks.All from The node will serve as both a data storage node and a data compute node at the same time, so that the main purpose of the design is to realize as much localized computation as possible in a big data environment to improve the processing performance of the system. In order to timely detect and discover clusters If a slave node fails, the master node periodically detects the slave node using the heartbeat mechanism. If the slave node can not respond to heartbeat information effectively, the system considers the slave node invalid.

From a software system perspective, Hadoop systems include distributed storage and parallel computing in two parts. Distributed storage architecture, Hadoop based on the local file system on each slave node to build a logically integrated distributed file system, in order to provide large-scale scalable distributed data storage capabilities, the distributed file system HDFS (Hadoop Distributed File System), in which the master node responsible for controlling and managing the entire distributed file system is called a NameNode, and each slave node that is specifically responsible for data storage is called a DataNode.

Further, in order to parallelize the computational processing of large-scale data stored in HDFS, Hadoop provides a parallel computing framework called MapReduce. The framework can effectively manage and schedule the nodes in the entire cluster to complete the execution and data processing of parallel programs and let each slave node to localize the data on the local nodes as much as possible, in which, it is responsible for managing and scheduling the entire The master node that the cluster makes calculations is called the JobTracker, and each slave node that is responsible for the concrete data calculations is called TaskTracker. The JobTracker can be set on the same physical master server as the master node named NameNode, which manages the data storage. When the system is large and each is heavy, the two can be set separately. However, the data storage node DataNode and the calculation node TaskTracker are paired and set on the same physical slave node server.

Other subsystems in the Hadoop system, such as HBase, Hive, etc., will be based on the above HDFS distributed file system and the MapReduce parallel computing framework.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.