Taobao Hadoop cluster machine hardware configuration

Source: Internet
Author: User
Keywords nbsp ; hardware every day so
Tags advertising advertising system aliyun configuration data group hadoop hardware
Taobao http://www.aliyun.com/zixun/aggregation/14119.html"> Hadoop cluster machine hardware configuration

Hadoop companies at home and abroad more, the world's largest Hadoop cluster in Yahoo, there are about 25,000 nodes, mainly used to support the advertising system and web search. Domestic Hadoop mainly Baidu, Taobao, Tencent, Huawei, China Mobile, Taobao Hadoop cluster which belongs to the larger (if not the largest).

Taobao Hadoop cluster now more than 1700 nodes, serving the entire Alibaba Group for various departments, data from various departments of the online database (Oracle, MySQL) backup, system log and crawler data, the total number of more than 17 PB, net daily growth of about 20T. There are more than 40,000 MapReduce tasks running daily in a Hadoop cluster (sometimes more than 60,000), most of which are daily statistical tasks such as data cube, quantum statistics, recommendation systems, leaderboards, and more. These tasks generally begin around 1:00 AM and are completed within 3-4 hours. Reading data every day in about 2PB, write data in about 1PB.

this picture is from Taobao

Hadoop includes two types of nodes Master and Slave nodes,

Master nodes include Jobtracker, Namenode, SecondName, Standby,

Hardware configuration: 16CPU * 4 core, 96G memory.

Slave nodes are mainly TaskTracker and DataNode,

Hardware configuration there are some differences: 8CPU * 4 nuclear -16CPU * 4 nuclear, 16G-24G memory

(Note: usually a slave node is also TaskTracker and DataNode, the purpose is to improve the data locality).

Each slave node is divided into 12 to 24 slots. The entire cluster is about 34,916 slots, with 19,643 Map slots and 15,273 Reduce slots

All assignments are split into groups, divided by department or group, for a total of 38 groups. The entire cluster resources are also divided by each Group, defining the maximum number of concurrent tasks for each Group, Map slots and the use of Reduce slots. Each job can only use its own group of slots resources.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.