Briefly describe these systems:
Hbase–key/value Distributed Database
A collaborative system for zookeeper– support distributed applications
Hive–sql resolution Engine
flume– Distributed log-collection system
First, the relevant environmental description:
S1:
Hadoop-master
Namenode,jobtracker;
Secondarynamenode;
Datanode,tasktracker
S2:
Hadoop-node-1
Datanode,tasktracker;
S3:
Hadoop-node-2
Datanode,tasktracker;
namenode– the entire HDFs namespace management Service
Secondarynamenode– can be viewed as a redundant service of Namenode
Job Management Services for jobtracker– parallel computing
Node Services for Datanode–hdfs
tasktracker– Job Execution service for parallel computing
Second, the prerequisite system environment configuration:
1. Add hosts record (all machines)
hwl@hadoop-master:~$ cat/etc/hosts
192.168.242.128 Hadoop-master
192.168.242.128 hadoop-secondary
192.168.242.129 hadoop-node-1
192.168.242.130 hadoop-node-2
2. Modify Host Name
hwl@hadoop-master:~$ Cat/etc/hostname
Hadoop-master
hwl@hadoop-node-1:~$ Cat/etc/hostname
Hadoop-node-1
hwl@hadoop-node-2:~$ Cat/etc/hostname
Hadoop-node-2
3. All machines are configured to each other key-free key (abbreviated)
Third, the Hadoop environment configuration:
1. Select installation package
For a more convenient and standardized deployment of the Hadoop cluster, we used the Cloudera integration package.
Because Cloudera has done a lot of optimization on Hadoop-related systems, many bugs have been avoided due to different versions of the system.
https://ccp.cloudera.com/display/DOC/Documentation//
2. Installing the Java Environment
Because the entire Hadoop project is primarily done through Java development, the support of the JVM is required.
Add a matching Java version of apt source
Installed on all servers:
Apt-get Install Python-software-properties
Vim/etc/apt/sources.list.d/sun-java-community-team-sun-java6-maverick.list
Deb Http://ppa.launchpad.net/sun-java-community-team/sun-java6/ubuntu Maverick Main
DEB-SRC Http://ppa.launchpad.net/sun-java-community-team/sun-java6/ubuntu Maverick Main
Install SUN-JAVA6-JDK
Add-apt-repository Ppa:sun-java-community-team/sun-java6
Apt-get Update
Apt-get Install SUN-JAVA6-JDK
3. Add Cloudera to the Hadoop installation source
Vim/etc/apt/sources.list.d/cloudera.list
Deb Http://archive.cloudera.com/debian MAVERICK-CDH3U3 Contrib
DEB-SRC Http://archive.cloudera.com/debian MAVERICK-CDH3U3 Contrib
Apt-get Install Curl
Curl-s Http://archive.cloudera.com/debian/archive.key | sudo apt-key add-
Apt-get Update
4. Installing Hadoop-related kits
Install on Hadoop-master:
Apt-get Install Hadoop-0.20-namenode
Apt-get Install Hadoop-0.20-datanode
Apt-get Install Hadoop-0.20-secondarynamenode
Apt-get Install Hadoop-0.20-jobtracker
Hadoop-node-1 and Hadoop-node-2 are installed on:
Apt-get Install Hadoop-0.20-datanode
Apt-get Install Hadoop-0.20-tasktracker
5. Create the Hadoop configuration file
Cp-r/etc/hadoop-0.20/conf.empty/etc/hadoop-0.20/conf.my_cluster
6. Activate the new configuration file
Update-alternatives–install/etc/hadoop-0.20/conf Hadoop-0.20-conf/etc/hadoop-0.20/conf.my_cluster 50 (priority configuration)
Query Current configuration:
Update-alternatives–display hadoop-0.20-conf
7. Configure Hadoop related files
7.1 Configure Java environment variable locations on all servers:
hwl@hadoop-master:~$ cat/etc/hadoop/conf/hadoop-env.sh
# Set hadoop-specific Environment variables here.
Export Java_home= "/usr/lib/jvm/java-6-sun"
7.2 Master, slave name is configured on all servers:
hwl@hadoop-master:~$ cat/etc/hadoop/conf/masters
Hadoop-master
hwl@hadoop-master:~$ cat/etc/hadoop/conf/slaves
Hadoop-node-1
Hadoop-node-2
7.3 Creating the HDFs Directory
Mkdir-p/data/storage
Mkdir-p/data/hdfs
chmod 700/data/hdfs
Chown-r Hdfs:hadoop/data/hdfs
chmod 777/data/storage
chmod o+t/data/storage
7.4 All server configurations Core-site.xml
////
////
Hadoop.tmp.dir
/data/storage
A directory for the other temporary directories.
Fs.default.name
hdfs://hadoop-master:8020
HADOOP.TMP.DIR Specifies the directory where all files uploaded to Hadoop are stored, so make sure that the directory is large enough.
FS.DEFAULT.NAME Specifies the address and port number of the Namenode.
7.5 All server Configurations Hdfs-site.xml
////
////
Dfs.name.dir
${hadoop.tmp.dir}/dfs/name Dfs.data.dir
/data/hdfs dfs.replication
2 Dfs.datanode.max.xcievers
4096 Fs.checkpoint.period
Fs.checkpoint.dir
${hadoop.tmp.dir}/dfs/namesecondary dfs.namenode.secondary.http-address
hadoop-secondary:50090
DFS.DATA.DIR specifies where the data node holds the data.
Dfs.replication specifies the number of times each block needs to be backed up, the role of a redundant backup, the value must be less than the number of datanode, or there will be an error.
dfs.datanode.max.xcievers specifies the upper bound for HDFs Datanode to process files concurrently.
7.6 All server Configurations Mapred-site.xml
////
////
Mapred.job.tracker
hdfs://hadoop-master:8021 Mapred.system.dir
/mapred/system Mapreduce.jobtracker.staging.root.dir
/user
Mapred.job.tracker locates the address and port of the Jobtracker.
Mapred.system.dir Locate the directory stored in the HDFs.
8. Format HDFs Distributed File system
hwl@hadoop-master:~$ sudo-u HDFs Hadoop namenode-format
[sudo] password for HWL:
14/05/11 19:18:31 INFO Namenode. Namenode:startup_msg:
/************************************************************
Startup_msg:starting Namenode
Startup_msg:host = hadoop-master/192.168.242.128
Startup_msg:args = [-format]
Startup_msg:version = 0.20.2-cdh3u3
Startup_msg:build = File:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.197-1~maverick-r 318bc781117fa276ae81a3d111f5eeba0020634f; Compiled by ' Root ' on Tue 13:45:02 PDT 2012
************************************************************/
14/05/11 19:18:31 INFO util. GSET:VM type = 32-bit
14/05/11 19:18:31 INFO util. gset:2% max memory = 19.33375 MB
14/05/11 19:18:31 INFO util. gset:capacity = 2^22 = 4194304 entries
14/05/11 19:18:31 INFO util. gset:recommended=4194304, actual=4194304
14/05/11 19:18:32 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
14/05/11 19:18:32 INFO Namenode. Fsnamesystem:fsowner=hdfs (Auth:simple)
14/05/11 19:18:32 INFO Namenode. Fsnamesystem:supergroup=supergroup
14/05/11 19:18:32 INFO Namenode. Fsnamesystem:ispermissionenabled=true
14/05/11 19:18:32 INFO Namenode. fsnamesystem:dfs.block.invalidate.limit=1000
14/05/11 19:18:32 INFO Namenode. Fsnamesystem:isaccesstokenenabled=false accesskeyupdateinterval=0 min (s), Accesstokenlifetime=0 min (s)
14/05/11 19:18:32 INFO Common. Storage:image file of size saved in 0 seconds.
14/05/11 19:18:32 INFO Common. Storage:storage Directory/data/storage/dfs/name has been successfully formatted.
14/05/11 19:18:32 INFO Namenode. Namenode:shutdown_msg:
/************************************************************
Shutdown_msg:shutting down Namenode at hadoop-master/192.168.242.128
************************************************************/
9. Start the related process
9.1 Master:
hwl@hadoop-master:~$ Sudo/etc/init.d/hadoop-0.20-datanode Start
Starting Hadoop Datanode Daemon:datanode running as process 1218. Stop it.
Hadoop-0.20-datanode.
hwl@hadoop-master:~$ Sudo/etc/init.d/hadoop-0.20-namenode Start
Starting Hadoop Namenode daemon:starting Namenode, logging to/usr/lib/hadoop-0.20/logs/ Hadoop-hadoop-namenode-hadoop-master.out
Hadoop-0.20-namenode.
hwl@hadoop-master:~$ Sudo/etc/init.d/hadoop-0.20-jobtracker start (started two times before success, the first log shows shutdown)
Starting Hadoop Jobtracker daemon:starting Jobtracker, logging to/usr/lib/hadoop-0.20/logs/ Hadoop-hadoop-jobtracker-hadoop-master.out
Hadoop-0.20-jobtracker.
hwl@hadoop-master:~$ Sudo/etc/init.d/hadoop-0.20-secondarynamenode Start
Starting Hadoop Secondarynamenode Daemon:secondarynamenode running as process 1586. Stop it.
Hadoop-0.20-secondarynamenode.
hwl@hadoop-master:~$ sudo netstat-tnpl
Active Internet connections (only servers)
Proto recv-q Send-q Local address Foreign address State Pid/program Name
TCP 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 838/sshd
TCP6 0 0::: 38197:::* LISTEN 1589/java
TCP6 0 0::: 50070:::* LISTEN 2070/java
TCP6 0 0:::::* LISTEN 838/sshd
TCP6 0 0::: 50010:::* LISTEN 1274/java
TCP6 0 0::: 50075:::* LISTEN 1274/java
TCP6 0 0::: 50020:::* LISTEN 1274/java
TCP6 0 0::: 50090:::* LISTEN 1589/java
TCP6 0 0::: 45579:::* LISTEN 2070/java
TCP6 0 0::: 36590:::* LISTEN 1274/java
TCP6 0 0 192.168.242.128:8020:::* LISTEN 2070/java
hwl@hadoop-master:~$ sudo jps
2070 Namenode
3117 Jps
1589 Secondarynamenode
1274 Datanode
3061 Jobtracker
On 9.2 node:
hwl@hadoop-node-1:~$ Sudo/etc/init.d/hadoop-0.20-datanode Start
Starting Hadoop Datanode Daemon:datanode running as process 1400. Stop it.
Hadoop-0.20-datanode.
hwl@hadoop-node-1:~$ Sudo/etc/init.d/hadoop-0.20-tasktracker Start
Starting Hadoop Tasktracker daemon:starting Tasktracker, logging to/usr/lib/hadoop-0.20/logs/ Hadoop-hadoop-tasktracker-hadoop-node-1.out
Hadoop-0.20-tasktracker.
hwl@hadoop-node-1:~$ sudo jps
1926 Tasktracker
1968 Jps
1428 Datanode
hwl@hadoop-node-2:~$ Sudo/etc/init.d/hadoop-0.20-datanode Start
Starting Hadoop Datanode Daemon:datanode running as process 1156. Stop it.
Hadoop-0.20-datanode.
hwl@hadoop-node-2:~$ Sudo/etc/init.d/hadoop-0.20-tasktracker Start
Starting Hadoop Tasktracker daemon:starting Tasktracker, logging to/usr/lib/hadoop-0.20/logs/ Hadoop-hadoop-tasktracker-hadoop-node-2.out
Hadoop-0.20-tasktracker.
hwl@hadoop-node-2:~$ sudo jps
1864 Tasktracker
1189 Datanode
1905 Jps
10 Creating the Mapred.system.dir HDFs directory
hwl@hadoop-master:~$ sudo-u HDFs Hadoop fs-mkdir/mapred/system
14/05/11 19:30:54 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo-u HDFs Hadoop fs-chown Mapred:hadoop/mapred/system
14/05/11 19:31:11 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
11 Test related operations on HDFs
hwl@hadoop-master:~$ echo "Hello" > Hello.txt
hwl@hadoop-master:~$ sudo-u HDFs Hadoop fs-mkdir/hwl
14/05/11 19:31:52 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo-u HDFs Hadoop fs-copyfromlocal HELLO.TXT/HWL
14/05/11 19:32:03 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
hwl@hadoop-master:~$ sudo-u HDFs Hadoop fs-ls/hwl
14/05/11 19:32:17 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
Found 1 Items
-rw-r–r–2 HDFs supergroup 2014-05-11 19:32/hwl/hello.txt
12 View cluster Status:
12.1 Web View
http://192.168.242.128:50070/
http://192.168.242.128:50030/
12.2 Command line View
hwl@hadoop-master:~$ sudo-u HDFs Hadoop dfsadmin-report
14/05/11 19:45:11 INFO Security. Usergroupinformation:jaas Configuration already set up to Hadoop, not re-installing.
Configured capacity:252069396480 (234.76 GB)
Present capacity:234272096256 (218.18 GB)
DFS remaining:234271989760 (218.18 GB)
DFS used:106496 (KB)
DFS used%: 0%
Under Replicated blocks:0
Blocks with corrupt replicas:0
Missing blocks:0
————————————————-
Datanodes Available:3 (3 total, 0 dead)
name:192.168.242.128:50010
Decommission Status:normal
Configured capacity:84023132160 (78.25 GB)
DFS used:40960 (KB)
Non DFS used:5935935488 (5.53 GB)
DFS remaining:78087155712 (72.72 GB)
DFS used%: 0%
DFS remaining%: 92.94%
Last Contact:sun 19:45:11 PDT 2014
name:192.168.242.129:50010
Decommission Status:normal
Configured capacity:84023132160 (78.25 GB)
DFS used:28672 (KB)
Non DFS used:5931614208 (5.52 GB)
DFS remaining:78091489280 (72.73 GB)
DFS used%: 0%
DFS remaining%: 92.94%
Last Contact:sun 19:45:08 PDT 2014
name:192.168.242.130:50010
Decommission Status:normal
Configured capacity:84023132160 (78.25 GB)
DFS used:36864 (KB)
Non DFS used:5929750528 (5.52 GB)
DFS remaining:78093344768 (72.73 GB)
DFS used%: 0%
DFS remaining%: 92.94%
Last Contact:sun 19:45:08 PDT 2014