Address: http://blog.cloudera.com/blog/2013/04/how-to-use-vagrant-to-set-up-a-virtual-hadoop-cluster/
Vagrant is a very useful tool that can be used to program and manage multiple virtual machines (VMS) on a single physical machine ). It supports native virtualbox and provides plug-ins for VMWare Fusion and Amazon EC2 Virtual Machine clusters.
Vagrant provides an easy-to-use ruby-based internal DSL that allows users to define one or more virtual machines using their configuration parameters. In addition, for automatic deployment, vagrant supports multiple mechanisms: puppet, Chef, or shell scripts used to automatically install software programs and configurations on all virtual machines defined in the vagrant configuration file.
So it's cool to use vagrant to define a complex virtual framework on a system running multiple VMS?
A typical use case of vagrant is to build a work or development environment in a simple and consistent manner. At eligotech (formerly known as the author), developers are developing a product designed for users to simply use Apache hadoop and CDH (open-source version of cloudera ). Developers often need to install the hadoop environment on machines for testing. They found that vagrant is a very convenient tool in this regard.
An example of a vagrant configuration file can be tested by yourself. You need to download and install vagrant (help address http://docs.vagrantup.com/v2/installation/index.html) and virtualbox. After everything is installed, copy and paste the following text and save it as vagrantfile, and put it in a directory, such as vagranthadoop. this configuration file assumes that your machine's memory is at least 32 GB. If not, you can edit the file on your own.
# -*- mode: ruby -*-# vi: set ft=ruby :$master_script = <<SCRIPT#!/bin/bashcat > /etc/hosts <<EOF127.0.0.1 localhost# The following lines are desirable for IPv6 capable hosts::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100 vm-cluster-node110.211.55.101 vm-cluster-node210.211.55.102 vm-cluster-node310.211.55.103 vm-cluster-node410.211.55.104 vm-cluster-node510.211.55.105 vm-cluster-clientEOFapt-get install curl -yREPOCM=${REPOCM:-cm4}CM_REPO_HOST=${CM_REPO_HOST:-archive.cloudera.com}CM_MAJOR_VERSION=$(echo $REPOCM | sed -e 's/cm\\([0-9]\\).*/\\1/')CM_VERSION=$(echo $REPOCM | sed -e 's/cm\\([0-9][0-9]*\\)/\\1/')OS_CODENAME=$(lsb_release -sc)OS_DISTID=$(lsb_release -si | tr '[A-Z]' '[a-z]')if [ $CM_MAJOR_VERSION -ge 4 ]; then cat > /etc/apt/sources.list.d/cloudera-$REPOCM.list <<EOFdeb [arch=amd64] http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm $OS_CODENAME-$REPOCM contribdeb-src http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm $OS_CODENAME-$REPOCM contribEOFcurl -s http://$CM_REPO_HOST/cm$CM_MAJOR_VERSION/$OS_DISTID/$OS_CODENAME/amd64/cm/archive.key > keyapt-key add keyrm keyfiapt-get updateexport DEBIAN_FRONTEND=noninteractiveapt-get -q -y --force-yes install oracle-j2sdk1.6 cloudera-manager-server-db cloudera-manager-server cloudera-manager-daemonsservice cloudera-scm-server-db initdbservice cloudera-scm-server-db startservice cloudera-scm-server startSCRIPT$slave_script = <<SCRIPTcat > /etc/hosts <<EOF127.0.0.1 localhost# The following lines are desirable for IPv6 capable hosts::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100 vm-cluster-node110.211.55.101 vm-cluster-node210.211.55.102 vm-cluster-node310.211.55.103 vm-cluster-node410.211.55.104 vm-cluster-node510.211.55.105 vm-cluster-clientEOFSCRIPT$client_script = <<SCRIPTcat > /etc/hosts <<EOF127.0.0.1 localhost# The following lines are desirable for IPv6 capable hosts::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters10.211.55.100 vm-cluster-node110.211.55.101 vm-cluster-node210.211.55.102 vm-cluster-node310.211.55.103 vm-cluster-node410.211.55.104 vm-cluster-node510.211.55.105 vm-cluster-clientEOFSCRIPTVagrant.configure("2") do |config| config.vm.define :master do |master| master.vm.box = "precise64" master.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "4096" end master.vm.provider :virtualbox do |v| v.name = "vm-cluster-node1" v.customize ["modifyvm", :id, "--memory", "4096"] end master.vm.network :private_network, ip: "10.211.55.100" master.vm.hostname = "vm-cluster-node1" master.vm.provision :shell, :inline => $master_script end config.vm.define :slave1 do |slave1| slave1.vm.box = "precise64" slave1.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "5120" end slave1.vm.provider :virtualbox do |v| v.name = "vm-cluster-node2" v.customize ["modifyvm", :id, "--memory", "5120"] end slave1.vm.network :private_network, ip: "10.211.55.101" slave1.vm.hostname = "vm-cluster-node2" slave1.vm.provision :shell, :inline => $slave_script end config.vm.define :slave2 do |slave2| slave2.vm.box = "precise64" slave2.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "5120" end slave2.vm.provider :virtualbox do |v| v.name = "vm-cluster-node3" v.customize ["modifyvm", :id, "--memory", "5120"] end slave2.vm.network :private_network, ip: "10.211.55.102" slave2.vm.hostname = "vm-cluster-node3" slave2.vm.provision :shell, :inline => $slave_script end config.vm.define :slave3 do |slave3| slave3.vm.box = "precise64" slave3.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "5120" end slave3.vm.provider :virtualbox do |v| v.name = "vm-cluster-node4" v.customize ["modifyvm", :id, "--memory", "5120"] end slave3.vm.network :private_network, ip: "10.211.55.103" slave3.vm.hostname = "vm-cluster-node4" slave3.vm.provision :shell, :inline => $slave_script end config.vm.define :slave4 do |slave4| slave4.vm.box = "precise64" slave4.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "5120" end slave4.vm.provider :virtualbox do |v| v.name = "vm-cluster-node5" v.customize ["modifyvm", :id, "--memory", "5120"] end slave4.vm.network :private_network, ip: "10.211.55.104" slave4.vm.hostname = "vm-cluster-node5" slave4.vm.provision :shell, :inline => $slave_script end config.vm.define :client do |client| client.vm.box = "precise64" client.vm.provider "vmware_fusion" do |v| v.vmx["memsize"] = "4096" end client.vm.provider :virtualbox do |v| v.name = "vm-cluster-client" v.customize ["modifyvm", :id, "--memory", "4096"] end client.vm.network :private_network, ip: "10.211.55.105" client.vm.hostname = "vm-cluster-client" client.vm.provision :shell, :inline => $client_script endend
The configuration file defines six VMS and specifies the roles of each VM (following the CDH role ):
- Vm-cluster-node1: This is the master; Besides running the CM master, it shocould run the namenode, secondary namenode, and jobtracker.
- Vm-cluster-node2: This is a slave, it shoshould run a datanode and a tasktracker.
- Vm-cluster-node3: This is a slave, it shoshould run a datanode and a tasktracker.
- Vm-cluster-node4: This is a slave, it shoshould run a datanode and a tasktracker.
- Vm-cluster-node5: This is a slave, it shoshould run a datanode and a tasktracker.
- VM-cluster-Client: This machine plays the role of gateway for the cluster.
Click http://docs.vagrantup.com/v2/vagrantfile/index.html. In particular, it depends on different providers, virtualbox or vmwarefusion, And the defined memory size is different. We can see that different providers are used.
How easy it is to customize the required environment!
The above vagrant configuration file does another important thing: automatically install cloudera manager on the master and node1 nodes.
To create a virtual machine cluster, you only need to go to the configuration file directory (such as vagranthadoop) and run it in shell:
> vagrant up --provider=virtualbox
After a period of time (depending on the performance of your machine), the vagrant operation is complete, meaning that all virtual machines have been installed and configured and are in the running state.
At this point, you can configure your cluster through cm management UI (http: // vm-cluster-node1: 7180.
Have fun!