How to Use vagrant to install a hadoop cluster on a virtual machine

Source: Internet
Author: User
Tags vmware fusion

Address: http://blog.cloudera.com/blog/2013/04/how-to-use-vagrant-to-set-up-a-virtual-hadoop-cluster/

Vagrant is a very useful tool that can be used to program and manage multiple virtual machines (VMS) on a single physical machine ). It supports native virtualbox and provides plug-ins for VMWare Fusion and Amazon EC2 Virtual Machine clusters.

Vagrant provides an easy-to-use ruby-based internal DSL that allows users to define one or more virtual machines using their configuration parameters. In addition, for automatic deployment, vagrant supports multiple mechanisms: puppet, Chef, or shell scripts used to automatically install software programs and configurations on all virtual machines defined in the vagrant configuration file.


So it's cool to use vagrant to define a complex virtual framework on a system running multiple VMS?


A typical use case of vagrant is to build a work or development environment in a simple and consistent manner. At eligotech (formerly known as the author), developers are developing a product designed for users to simply use Apache hadoop and CDH (open-source version of cloudera ). Developers often need to install the hadoop environment on machines for testing. They found that vagrant is a very convenient tool in this regard.


An example of a vagrant configuration file can be tested by yourself. You need to download and install vagrant (help address http://docs.vagrantup.com/v2/installation/index.html) and virtualbox. After everything is installed, copy and paste the following text and save it as vagrantfile, and put it in a directory, such as vagranthadoop. this configuration file assumes that your machine's memory is at least 32 GB. If not, you can edit the file on your own.

 

# -*- mode: ruby -*-# vi: set ft=ruby :$master_script = <<SCRIPT#!/bin/bashcat > /etc/hosts <<EOF127.0.0.1       localhost# The following lines are desirable for IPv6 capable hosts::1     ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100   vm-cluster-node110.211.55.101   vm-cluster-node210.211.55.102   vm-cluster-node310.211.55.103   vm-cluster-node410.211.55.104   vm-cluster-node510.211.55.105   vm-cluster-clientEOFapt-get install curl -yREPOCM=${REPOCM:-cm4}CM_REPO_HOST=${CM_REPO_HOST:-archive.cloudera.com}CM_MAJOR_VERSION=$(echo $REPOCM | sed -e 's/cm\\([0-9]\\).*/\\1/')CM_VERSION=$(echo $REPOCM | sed -e 's/cm\\([0-9][0-9]*\\)/\\1/')OS_CODENAME=$(lsb_release -sc)OS_DISTID=$(lsb_release -si | tr '[A-Z]' '[a-z]')if [ $CM_MAJOR_VERSION -ge 4 ]; then  cat > /etc/apt/sources.list.d/cloudera-$REPOCM.list <<EOFdeb [arch=amd64] http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm $OS_CODENAME-$REPOCM contribdeb-src http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm $OS_CODENAME-$REPOCM contribEOFcurl -s http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm/archive.key > keyapt-key add keyrm keyfiapt-get updateexport DEBIAN_FRONTEND=noninteractiveapt-get -q -y --force-yes install oracle-j2sdk1.6 cloudera-manager-server-db cloudera-manager-server cloudera-manager-daemonsservice cloudera-scm-server-db initdbservice cloudera-scm-server-db startservice cloudera-scm-server startSCRIPT$slave_script = <<SCRIPTcat > /etc/hosts <<EOF127.0.0.1       localhost# The following lines are desirable for IPv6 capable hosts::1     ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100   vm-cluster-node110.211.55.101   vm-cluster-node210.211.55.102   vm-cluster-node310.211.55.103   vm-cluster-node410.211.55.104   vm-cluster-node510.211.55.105   vm-cluster-clientEOFSCRIPT$client_script = <<SCRIPTcat > /etc/hosts <<EOF127.0.0.1       localhost# The following lines are desirable for IPv6 capable hosts::1     ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100   vm-cluster-node110.211.55.101   vm-cluster-node210.211.55.102   vm-cluster-node310.211.55.103   vm-cluster-node410.211.55.104   vm-cluster-node510.211.55.105   vm-cluster-clientEOFSCRIPTVagrant.configure("2") do |config|  config.vm.define :master do |master|    master.vm.box = "precise64"    master.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "4096"    end    master.vm.provider :virtualbox do |v|      v.name = "vm-cluster-node1"      v.customize ["modifyvm", :id, "--memory", "4096"]    end    master.vm.network :private_network, ip: "10.211.55.100"    master.vm.hostname = "vm-cluster-node1"    master.vm.provision :shell, :inline => $master_script  end  config.vm.define :slave1 do |slave1|    slave1.vm.box = "precise64"    slave1.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "5120"    end    slave1.vm.provider :virtualbox do |v|      v.name = "vm-cluster-node2"      v.customize ["modifyvm", :id, "--memory", "5120"]    end    slave1.vm.network :private_network, ip: "10.211.55.101"    slave1.vm.hostname = "vm-cluster-node2"    slave1.vm.provision :shell, :inline => $slave_script  end  config.vm.define :slave2 do |slave2|    slave2.vm.box = "precise64"    slave2.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "5120"    end    slave2.vm.provider :virtualbox do |v|      v.name = "vm-cluster-node3"      v.customize ["modifyvm", :id, "--memory", "5120"]    end    slave2.vm.network :private_network, ip: "10.211.55.102"    slave2.vm.hostname = "vm-cluster-node3"    slave2.vm.provision :shell, :inline => $slave_script  end  config.vm.define :slave3 do |slave3|    slave3.vm.box = "precise64"    slave3.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "5120"    end    slave3.vm.provider :virtualbox do |v|      v.name = "vm-cluster-node4"      v.customize ["modifyvm", :id, "--memory", "5120"]    end    slave3.vm.network :private_network, ip: "10.211.55.103"    slave3.vm.hostname = "vm-cluster-node4"    slave3.vm.provision :shell, :inline => $slave_script  end  config.vm.define :slave4 do |slave4|    slave4.vm.box = "precise64"    slave4.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "5120"    end    slave4.vm.provider :virtualbox do |v|      v.name = "vm-cluster-node5"      v.customize ["modifyvm", :id, "--memory", "5120"]    end    slave4.vm.network :private_network, ip: "10.211.55.104"    slave4.vm.hostname = "vm-cluster-node5"    slave4.vm.provision :shell, :inline => $slave_script  end  config.vm.define :client do |client|    client.vm.box = "precise64"    client.vm.provider "vmware_fusion" do |v|      v.vmx["memsize"]  = "4096"    end    client.vm.provider :virtualbox do |v|      v.name = "vm-cluster-client"      v.customize ["modifyvm", :id, "--memory", "4096"]    end    client.vm.network :private_network, ip: "10.211.55.105"    client.vm.hostname = "vm-cluster-client"    client.vm.provision :shell, :inline => $client_script  endend

The configuration file defines six VMS and specifies the roles of each VM (following the CDH role ):

 

 

  • Vm-cluster-node1: This is the master; Besides running the CM master, it shocould run the namenode, secondary namenode, and jobtracker.
  • Vm-cluster-node2: This is a slave, it shoshould run a datanode and a tasktracker.
  • Vm-cluster-node3: This is a slave, it shoshould run a datanode and a tasktracker.
  • Vm-cluster-node4: This is a slave, it shoshould run a datanode and a tasktracker.
  • Vm-cluster-node5: This is a slave, it shoshould run a datanode and a tasktracker.
  • VM-cluster-Client: This machine plays the role of gateway for the cluster.

Click http://docs.vagrantup.com/v2/vagrantfile/index.html. In particular, it depends on different providers, virtualbox or vmwarefusion, And the defined memory size is different. We can see that different providers are used.
How easy it is to customize the required environment!

 


The above vagrant configuration file does another important thing: automatically install cloudera manager on the master and node1 nodes.


To create a virtual machine cluster, you only need to go to the configuration file directory (such as vagranthadoop) and run it in shell:

 

> vagrant up --provider=virtualbox

 


After a period of time (depending on the performance of your machine), the vagrant operation is complete, meaning that all virtual machines have been installed and configured and are in the running state.


At this point, you can configure your cluster through cm management UI (http: // vm-cluster-node1: 7180.

Have fun!

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.