(1) Planned initial scale: clustering, which is closely related to the data center infrastructure and configuration, advises users to start small clusters, such as 5 or 6 servers, deploy Hadoop, and then run the standard Hadoop benchmarks to understand the characteristics of their data centers at the outset of unpredictable environmental performance. Then incrementally add resources such as servers and storage as needed.
(2) Select server: CPU does not recommend less than 2 * Quad-core and activate HT (hyper-http://www.aliyun.com/zixun/aggregation/29914.html ">threading"); At least 4G of memory is configured for each compute kernel, and 6% of the memory is reserved for virtualization to run efficiently. Hadoop performance is sensitive to I/O, and it is recommended that each server be configured with more than one block of local storage instead of a small, large-capacity hard drive. Considering the cost of task scheduling, it is not recommended to configure more than 2 blocks of local storage for each compute kernel. 10G network adapters are recommended for high-performance considerations. Consider configuring dual power supplies for the primary node server (running Namenode, jobtracker) to improve reliability.
(3) Virtualization configuration: Local storage to avoid configuration as raid, for each physical disk to create a datastore virtualized network configuration for reliability and network transmission efficiency, isolation Management Network and Hadoop cluster network. As shown in Figure 4:
Figure: Virtualized Network configuration
(4) System setup: BDE will automatically configure the virtual disk and operating system parameters obtained based on the experimental experience to mask the specific details of performance optimization. It is recommended that you replace the default template for performance-sensitive users with centos6*, because the Linux 6.* kernel THP (transparenthuge Page) and ept (Extended Pagetables,intel processors) can help virtualize performance together.
(5) Hadoop configuration: BDE will automatically generate and configure Hadoop profiles (mainly within Map-site.xml,core-site.xml, and Hdfs-site.xml), including block size (BlockSize), session management, and logging capabilities. But there are some parameters related to the MapReduce task, including Mapred.reduce.parallel.copies,io.sort.mb,io.sort.factor,io.sort.record.percent, and Tasktracker.http.thread need to be set according to different load.
(6) Extended recommendation: If the user observes that CPU utilization in the cluster is often more than 80%, it is recommended to add a new node. In addition, the capacity of a single storage node does not recommend more than 24TB, otherwise, once the node fails, the data backup copy is liable to cause data congestion. Extensions can be performed based on performance benchmarking experience and resource usage on a small cluster.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.