Virtualization has injected unprecedented energy into Hadoop, from the perspective of it production management, as follows:
· Using Hadoop and other applications that consume different types of resources to deploy shared data centers can improve overall resource utilization;
• Flexible virtual machine operations enable users to dynamically create, expand their own Hadoop clusters based on datacenter resources, or reduce current clusters and release resources to support other applications if needed;
• The integration of HA and FT with the virtualization architecture to avoid single point failures in traditional Hadoop clusters, coupled with the data reliability of Hadoop itself, provides a reliable guarantee for large data applications in the enterprise.
For these reasons, vsphere Big Data Extensions (BDE) provides effective support for users ' flexibility in deploying and managing Hadoop clusters in virtualized environments. Aside from these advantages, does virtualization hurt the performance of Hadoop running? To this end, we do the same scale of virtualization deployment and physical deployment of the Hadoop cluster performance comparison and optimization, the experiment shows that the virtualization Hadoop cluster can support the production environment well.
Performance comparisons between virtualized and physical environments
Figure 1 shows the deployment style for the performance tuning test, where only one virtual machine is deployed on a physical server, and Tasktracker and Datanode run together in the same node. Because each virtual node can use all of the server resources, it facilitates the performance comparison and analysis of the virtualization and Hadoop deployed in the traditional physical environment. As shown in Figure 2, the performance comparison of virtualized Hadoop with respect to the physical environment is almost flat.
Figure 1: Performance Comparison deployment
Figure 2:apache Hadoop 1.2 performance comparisons for physical and virtualized deployments