Taobao http://www.aliyun.com/zixun/aggregation/14119.html"> Hadoop cluster machine hardware configuration
Hadoop companies at home and abroad more, the world's largest Hadoop cluster in Yahoo, there are about 25,000 nodes, mainly used to support the advertising system and web search. Domestic Hadoop mainly Baidu, Taobao, Tencent, Huawei, China Mobile, Taobao Hadoop cluster which belongs to the larger (if not the largest).
Taobao Hadoop cluster now more than 1700 nodes, serving the entire Alibaba Group for various departments, data from various departments of the online database (Oracle, MySQL) backup, system log and crawler data, the total number of more than 17 PB, net daily growth of about 20T. There are more than 40,000 MapReduce tasks running daily in a Hadoop cluster (sometimes more than 60,000), most of which are daily statistical tasks such as data cube, quantum statistics, recommendation systems, leaderboards, and more. These tasks generally begin around 1:00 AM and are completed within 3-4 hours. Reading data every day in about 2PB, write data in about 1PB.
this picture is from Taobao
Hadoop includes two types of nodes Master and Slave nodes,
Master nodes include Jobtracker, Namenode, SecondName, Standby,
Hardware configuration: 16CPU * 4 core, 96G memory.
Slave nodes are mainly TaskTracker and DataNode,
Hardware configuration there are some differences: 8CPU * 4 nuclear -16CPU * 4 nuclear, 16G-24G memory
(Note: usually a slave node is also TaskTracker and DataNode, the purpose is to improve the data locality).
Each slave node is divided into 12 to 24 slots. The entire cluster is about 34,916 slots, with 19,643 Map slots and 15,273 Reduce slots
All assignments are split into groups, divided by department or group, for a total of 38 groups. The entire cluster resources are also divided by each Group, defining the maximum number of concurrent tasks for each Group, Map slots and the use of Reduce slots. Each job can only use its own group of slots resources.