Topic Center

Contact Sales

Home > Developer Tools > Technical Articles

Hadoop system distributed storage and parallel computing architecture

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Parallel computing data storage system points each distributed storage

Tags aliyun computing control data data storage distributed distributed file system distributed storage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Figure 1-14 shows the Hadoop system http://www.aliyun.com/zixun/aggregation/14305.html Distributed storage and parallel computing architecture from the hardware architecture point of view, Hadoop system is running in a normal The distributed storage and parallel computing system of commercial server cluster.The cluster will have a master node used to control and manage the normal operation of the entire cluster and coordinate the management of each slave nodes in the cluster to complete the data storage and computing tasks.All from The node will serve as both a data storage node and a data compute node at the same time, so that the main purpose of the design is to realize as much localized computation as possible in a big data environment to improve the processing performance of the system. In order to timely detect and discover clusters If a slave node fails, the master node periodically detects the slave node using the heartbeat mechanism. If the slave node can not respond to heartbeat information effectively, the system considers the slave node invalid.

From a software system perspective, Hadoop systems include distributed storage and parallel computing in two parts. Distributed storage architecture, Hadoop based on the local file system on each slave node to build a logically integrated distributed file system, in order to provide large-scale scalable distributed data storage capabilities, the distributed file system HDFS (Hadoop Distributed File System), in which the master node responsible for controlling and managing the entire distributed file system is called a NameNode, and each slave node that is specifically responsible for data storage is called a DataNode.

Further, in order to parallelize the computational processing of large-scale data stored in HDFS, Hadoop provides a parallel computing framework called MapReduce. The framework can effectively manage and schedule the nodes in the entire cluster to complete the execution and data processing of parallel programs and let each slave node to localize the data on the local nodes as much as possible, in which, it is responsible for managing and scheduling the entire The master node that the cluster makes calculations is called the JobTracker, and each slave node that is responsible for the concrete data calculations is called TaskTracker. The JobTracker can be set on the same physical master server as the master node named NameNode, which manages the data storage. When the system is large and each is heavy, the two can be set separately. However, the data storage node DataNode and the calculation node TaskTracker are paired and set on the same physical slave node server.

Other subsystems in the Hadoop system, such as HBase, Hive, etc., will be based on the above HDFS distributed file system and the MapReduce parallel computing framework.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

cloud computing and distributed computing cloud storage system architecture difference between cloud computing and distributed computing difference between cloud computing grid computing and distributed computing enterprise architecture and cloud computing cloud computing infrastructure and architecture cloud computing and storage

What is SFTP Commands Linux_the Introduction 01-20

How to Configure CentOS 7.4 SFTP Server 01-19

Build an SFTP Server Using CentOS Built-in SSH Service 01-17

Configure Linux SFTP and Configure User Access 01-16

How to Easily Configure SFTP Server Linux In 6 Steps 01-15

Automatic Upload and Download of SFTP Files_Shell Script 01-14

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

about the cloud a forums a http and sign a websites am i black listed a firewall a dota a application an express

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop system distributed storage and parallel computing architecture

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support