Hadoop cluster hardware standard configuration
When selecting hardware, we often need to consider the performance and expenditure of applications. To this end, we must find a perfect balance between meeting actual needs and being economically feasible. The following uses the Hadoop cluster application as an example to describe the elements to consider.
1. I/O binding workload.
Hadoop involves the following I/O operations:
- Index)
- Grouping)
- Data Import and Export (Data import and exporting)
- Data movement and transformation)
2. CPU-bound workloads
In Hadoop, the execution of jobs requires the continuous operation of the CPU.
The following lists several aspects:
- Cluster processing (Clustering/Classification)
- Complex text mining
- Natural language processing (Natural-language processing)
- Special Feature extraction)
Now, let's explain the Hadoop standard configuration directly. Enterprise-level applications and private cloud processing may all involve such Configuration Requirements. The following lists the standard configuration specifications of DataNode/TaskTrackers (MR1:
- 12-24 1-4 TB hard drive (batch disks)
- 2 quad-/hex-/octo-core CPUs, running at least 2-2.5 GHz
- 64-512 GB of RAM (memory)
Bind Gigabit Ethernet (Gigabit network, more storage density, requiring higher network input)
Standard configuration specifications for NameNode/JobTracker (MR1) in a Hadoop cluster:
- 3-6 1 TB hard disks (batch disks)
- 2 quad-/hex-/octo-core CPUs, running at least 2-2.5 GHz
- 64-512 GB of RAM (memory)
Bind Gigabit Ethernet (Gigabit network, more storage density, requiring higher network input)
In fact, the hardware configuration is divided according to the business, and the parameters are slightly different. From the configuration above, we can see that the hard disk capacity required for data storage is relatively large. It can be seen that it is necessary to consider actual business needs. It may be a little smaller, and the configuration is slightly weaker.
The preceding parameters are for reference only.
-------------------------------------- Split line --------------------------------------
Install Hadoop 0.20.2 in CentOS 6.4
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
-------------------------------------- Split line --------------------------------------