2013-03-12 22:07 1503 people read comments (0) favorite reports Classification:Hadoop (+)
Directory (?) [+]
Hadoop+hive+mysql Installation Documentation
Software version
Redhat Enterprise server5.5 |
64 |
Hadoop |
1.0.0 |
Hive |
0.8.1 |
Mysql |
5 |
Jdk |
1.6 |
Overall architecture
A total of 7 machines, do 4 data nodes, the name node, Jobtracker and Secondaryname are separated, the machine division is as follows
Machine IP |
Host Name |
Use |
Note |
123.456.789.30 |
Master.hadoop |
Name node |
Master Node |
123.456.789.31 |
Slave1.hadoop |
Data Node 1 |
|
123.456.789.32 |
Slave2.hadoop |
Data Node 2 |
|
123.456.789.33 |
Slave3.hadoop |
Data Node 3 |
|
123.456.789.34 |
Slave4.hadoop |
Data Node 4 |
|
123.456.789.35 |
Job.hadoop |
Jobtracker |
|
123.456.789.36 |
Sec.hadoop |
Second name node |
|
The required installation package
Hadoop-1.0.0-bin.tar.gz
Mysql-5.1.52.tar.gz
Jdk-6u31-linux-x64.bin
Hive-0.8.1.tar.gz
Prepare for work (using the root user) Hosts file configuration
Configure/etc/hosts files on all machines (none of them must be done by all machines)
Vi/etc/hosts
Add the following line:
123.456.789.30 Master.hadoop
123.456.789.31 Slave1.hadoop
123.456.789.32 Slave2.hadoop
123.456.789.33 Slave3.hadoop
123.456.789.34 Slave4.hadoop
123.456.789.35 Job.hadoop
123.456.789.36 Sec.hadoop
Modify the hostname of each host
When Linux is installed, its default name is localhost. Modifying the/etc/sysconfig/network configuration file
Vi/etc/sysconfig/network
such as: 123.456.789.30 machine, modified to Master.hadoop
123.456.789.31 machine, modified to Slave1.hadoop
Special Note: None of them must be done with all the machines.
Build HDUser users and Hadoop groups
Groupadd Hadoop
Useradd-g Hadoop HDUser
passwd HDUser
Uploading files
Create folders under/home/hduser Tools, use HDUser users to ftp all installation packages to this folder
Configuration-free Authentication
To avoid the need for passwords in Hadoop operations, SSH-free password verification is required from the master node and Jobtracker to the machine machine
The master node uses the HDUser user to execute
Ssh-keygen-t rsa-p ""
Cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
SSH localhost
SSH Master.hadoop
To create a key to another host
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]
Special Note: When executing the ssh-copy-id, you need to enter Yes and the host's password.
Install Hadoop (using HDUser users) install JDK on host and unzip other software
A Switch to/home/hduser/tools, unzip the JDK
./jdk-6u31-linux-x64
Special Note: If the prompt permission is insufficient, empower it
chmod 777 *
B unzip Hadoop and rename it
TAR-XVF Hadoop-1.0.0-bin.tar
MV hadoop1.0.0 Hadoop
C decompression MySQL
TAR–XVF mysql-5.1.52.tar.gz
D Unzip hive and rename it
TAR–XVF hive-0.8.1.tar.gz
MV hive-0.8.1 Hive
Modifying a configuration file
#vi/etc/profile Add Environment variables
Export java_home=/home/hduser/jdk1.6.0_30/
Export classpath= $CLASSPATH: $JAVA _home/lib: $JAVA _home/jre/lib
Export Hadoop_home=/home/hduser/hadoop
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin
Export Hive_home=/home/hduser/hive
Export path= $HIVE _home/bin: $PATH
Special Note: Add on all Machines
Execute #source/etc/profile to make environment variables effective immediately
Modify Hadoop-env.sh
VI $HADOOP _home/etc/hadoop/hadoop-env.sh
The following modifications:
# The Java implementation to use. Required.
Export java_home=/home/hduser/jdk1.6.0_30
Export Hadoop_pid_dir=/home/hduser/pids
Create the desired folder
Create a folder under/home/hduser/hadoop tmp
Mkdir/home/hduser/hadoop/tmp
Uploading a configuration file
Upload all the files in the Hadoop_conf folder to the/home/hduser/hadoop/etc/hadoop folder to overwrite the original.
Modify Hdfs-site.xml
Increase:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Modify Core-site.xml
Modify the property Hadoop.tmp.dir value to:/home/hduser/hadoop/tmp
Modify attribute Dfs.http.address value to: sec.hadoop:50070 (point to Second name node)
Modify the property Dfs.hosts.exclude value to:/home/hduser/hadoop/etc/hadoop/excludes (for deleting nodes)
Modify Mapred-site.xml
Modify attribute Mapred.job.tracker value to: job.hadoop:54311 (point to Jobtracker)
Modify Masters
Modify the content to: Sec.hadoop
Modify the Slaves file
Modify the content to:
Slave1.hadoop
Slave2.hadoop
Slave3.hadoop
Slave4.hadoop
Transferring JDK and Hadoop to other hosts
Scp–r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r jdk1.6.0_31 [Email protected]:/home/hduser
Scp-r Hadoop [Email protected]:/home/hduser
Scp-r Hadoop [email protected]:/home/hduser
Scp-r Hadoop [email protected]:/home/hduser
Scp-r Hadoop [email protected]:/home/hduser
Scp-r Hadoop [email protected]:/home/hduser
Scp-r Hadoop [email protected]:/home/hduser
Start Hadoop
Because the test is now not directly using Start-all, the boot order is (both executed on the master node)
A Log in to the master node using HDUser, enter the Hadoop/sbin directory, start the name node
./hadoop-daemon.sh Start Namenode
b Start the Data node (just start the datanode for each data node)
./start-dfs.sh
C Log on Job.hadoop start Jobtracker (the Tasktracker of each node is also started)
./start-mapred.sh
Shutdown order (performed under Master node Sbin)
Close Datanode and Namenode
./stop-dfs.sh
Login to the job host (Close Tasktracker and Jobtracker)
./stop-mapred.sh
The issue of insufficient permissions may be reported at startup, and it is necessary to assign all the files under Hadoop/sbin on each host.
Verifying the startup effect
To see if the content running on each host is working properly, using JPS, the current planning results should be as follows:
Node name |
Run content |
Note |
Master Node |
Namenode |
|
Data node |
Datanode,tasktracker |
|
Job node |
Jobtracker |
|
Second name node |
Secondaryname |
|
Special Note: You can also view the log file under hadoop/logs/to view the startup
Install MySQL (using root)
1. Switch to MySQL extract directory
Cd/home/hduser/tools/mysql
2../configure--prefix=/usr/local/mysql--sysconfdir=/etc--localstatedir=/data/mysql
Note Modify Localstatedir to the location where you want to place the database file
3.make
4.make Install
5.make Clean
6.groupadd MySQL
7.useradd-g MySQL MySQL (the first MySQL is a group, the second MySQL is the user)
8.cd/usr/local/mysql
9.cp/usr/local/mysql/share/mysql/my-medium.cnf/etc/my.cnf
10. #bin/mysql_install_db--user=mysql #建立基本数据库, it must be specified as a MySQL user, only this step can appear in the Var directory under Usr/local/mysql
# Bin/mysqld_safe--user=mysql &
Bin/mysqladmin-u Root Password Oracle
12. Start and close MySQL (can skip)
Start MySQL
#bin/mysqld_safe & or/usr/local/mysql/share/mysql/mysql.server start MySQL
Stop MySQL Method 1
#/usr/local/mysql/share/mysql/mysql.server stop Stop MySQL
Close MySQL Method 2
#ps-aux|grep MySQL View process
#kill ID Number----This is to kill the MySQL process, the ID number is seen in the view of the MySQL process.
13. Registering MySQL as a service
Cp/usr/local/mysql/share/mysql/mysql.server/etc/init.d/mysqld
Add MySQL to the system service with the root user.
#/sbin/chkconfig--add mysqld add MySQL as service
#/sbin/chkconfig--del mysqld Delete mysql service
/sbin/service mysqld Restart #重新启动服务查看是否生效
/sbin/chkconfig--list mysqld #查看是否345运行级别都打开mysql
14. Create the appropriate MySQL account for hive and give sufficient permissions
Enter Root/usr/local/mysql/bin under execution:./mysql-u root-p;
Creating a Hive Database: Create databases hive;
Create user hive, which can only connect to the database from localhost and connect to the WordPress database: Grant all on hive.* to [email protected] identified by ' Oracle '.
Installing Hive (HDUser)
1. Modify the Hadoop_home in/conf/hive-env.sh.template in the hive directory to be the actual Hadoop installation directory:/home/hduser/hadoop
2. Create folder TMP and warehouse under Hive
Mkdir/home/hduser/hive/tmp
Mkdir/home/hduser/hive/warehouse
3. Create/tmp and/user/hive/warehouse in HDFs and set permissions:
Hadoop fs-mkdir/home/hduser/hive/tmp
Hadoop Fs-mkdir/home/hduser/hive/warehouse
Hadoop Fs-chmod g+w/home/hduser/hive/tmp
Hadoop Fs-chmod G+w/home/hduser/hive/warehouse
4. Copy the configuration files from the hive_conf to the/home/hduser/hive/conf
5. Modify the Hive-site.xml file
Modify the property Hive.metastore.warehouse.dir value to:/home/hduser/hive/warehouse
Modify the property Hive.exec.scratchdir value to:/home/hduser/hive/tmp
6. Copy the MySQL JDBC driver package Mysql-connector-java-5.0.7-bin.jar to the Lib directory of hive.
7. Start hive and execute show tables;
8. If the seventh step is no exception, the successful more than half, build several tables, load some data test
Hive>
CREATE TABLE Test_src
(
ACCOUNT1 String,
URL string)
Row format delimited terminated by ' \| ';
Touch A.txt
123412342|http://www.sohu.com
454534534|http://qww.cocm.ccc
Hive>
Load data local inpath '/data/myfile/a.txt ' into table test_src;
Currently using the most basic installation, not set any parameters, production needs to be configured with the relevant parameters
Sqoop installation (HDUser user)
1. Unzip the sqoop1.4.tar.gz
2. Rename to Sqoop
3. Modify the Sqoop file Bin/configure-sqoop, comment out all about hbase and zookeeper
4. Copy the Ojdbc6.jar and Hadoop-core-1.0.0.jar to the Sqoop/lib
5. Add Environment variables
Export Sqoop_home=xxxx
Export path= $SQOOP _home/bin: $PATH
Hadoop+hive+mysql Installation Documentation