Hadoop+hive+mysql Installation Documentation

Source: Internet
Author: User
Tags chmod mysql view wordpress database sqoop hadoop fs

2013-03-12 22:07 1503 people read comments (0) favorite reports Classification:Hadoop (+)

Directory (?) [+]

Hadoop+hive+mysql Installation Documentation

Software version

Redhat Enterprise server5.5

64

Hadoop

1.0.0

Hive

0.8.1

Mysql

5

Jdk

1.6

Overall architecture

A total of 7 machines, do 4 data nodes, the name node, Jobtracker and Secondaryname are separated, the machine division is as follows

Machine IP

Host Name

Use

Note

123.456.789.30

Master.hadoop

Name node

Master Node

123.456.789.31

Slave1.hadoop

Data Node 1

123.456.789.32

Slave2.hadoop

Data Node 2

123.456.789.33

Slave3.hadoop

Data Node 3

123.456.789.34

Slave4.hadoop

Data Node 4

123.456.789.35

Job.hadoop

Jobtracker

123.456.789.36

Sec.hadoop

Second name node

The required installation package

Hadoop-1.0.0-bin.tar.gz

Mysql-5.1.52.tar.gz

Jdk-6u31-linux-x64.bin

Hive-0.8.1.tar.gz

Prepare for work (using the root user) Hosts file configuration

Configure/etc/hosts files on all machines (none of them must be done by all machines)

Vi/etc/hosts

Add the following line:

123.456.789.30 Master.hadoop

123.456.789.31 Slave1.hadoop

123.456.789.32 Slave2.hadoop

123.456.789.33 Slave3.hadoop

123.456.789.34 Slave4.hadoop

123.456.789.35 Job.hadoop

123.456.789.36 Sec.hadoop

Modify the hostname of each host

When Linux is installed, its default name is localhost. Modifying the/etc/sysconfig/network configuration file

Vi/etc/sysconfig/network

such as: 123.456.789.30 machine, modified to Master.hadoop

123.456.789.31 machine, modified to Slave1.hadoop

Special Note: None of them must be done with all the machines.

Build HDUser users and Hadoop groups

Groupadd Hadoop

Useradd-g Hadoop HDUser

passwd HDUser

Uploading files

Create folders under/home/hduser Tools, use HDUser users to ftp all installation packages to this folder

Configuration-free Authentication

To avoid the need for passwords in Hadoop operations, SSH-free password verification is required from the master node and Jobtracker to the machine machine

The master node uses the HDUser user to execute

Ssh-keygen-t rsa-p ""

Cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

SSH localhost

SSH Master.hadoop

To create a key to another host

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub [email protected]

Special Note: When executing the ssh-copy-id, you need to enter Yes and the host's password.

Install Hadoop (using HDUser users) install JDK on host and unzip other software

A Switch to/home/hduser/tools, unzip the JDK

./jdk-6u31-linux-x64

Special Note: If the prompt permission is insufficient, empower it

chmod 777 *

B unzip Hadoop and rename it

TAR-XVF Hadoop-1.0.0-bin.tar

MV hadoop1.0.0 Hadoop

C decompression MySQL

TAR–XVF mysql-5.1.52.tar.gz

D Unzip hive and rename it

TAR–XVF hive-0.8.1.tar.gz

MV hive-0.8.1 Hive

Modifying a configuration file

#vi/etc/profile Add Environment variables

Export java_home=/home/hduser/jdk1.6.0_30/

Export classpath= $CLASSPATH: $JAVA _home/lib: $JAVA _home/jre/lib

Export Hadoop_home=/home/hduser/hadoop

Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin

Export Hive_home=/home/hduser/hive

Export path= $HIVE _home/bin: $PATH

Special Note: Add on all Machines

Execute #source/etc/profile to make environment variables effective immediately

Modify Hadoop-env.sh

VI $HADOOP _home/etc/hadoop/hadoop-env.sh

The following modifications:

# The Java implementation to use. Required.

Export java_home=/home/hduser/jdk1.6.0_30

Export Hadoop_pid_dir=/home/hduser/pids

Create the desired folder

Create a folder under/home/hduser/hadoop tmp

Mkdir/home/hduser/hadoop/tmp

Uploading a configuration file

Upload all the files in the Hadoop_conf folder to the/home/hduser/hadoop/etc/hadoop folder to overwrite the original.

Modify Hdfs-site.xml

Increase:

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

Modify Core-site.xml

Modify the property Hadoop.tmp.dir value to:/home/hduser/hadoop/tmp

Modify attribute Dfs.http.address value to: sec.hadoop:50070 (point to Second name node)

Modify the property Dfs.hosts.exclude value to:/home/hduser/hadoop/etc/hadoop/excludes (for deleting nodes)

Modify Mapred-site.xml

Modify attribute Mapred.job.tracker value to: job.hadoop:54311 (point to Jobtracker)

Modify Masters

Modify the content to: Sec.hadoop

Modify the Slaves file

Modify the content to:

Slave1.hadoop

Slave2.hadoop

Slave3.hadoop

Slave4.hadoop

Transferring JDK and Hadoop to other hosts

Scp–r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r jdk1.6.0_31 [Email protected]:/home/hduser

Scp-r Hadoop [Email protected]:/home/hduser

Scp-r Hadoop [email protected]:/home/hduser

Scp-r Hadoop [email protected]:/home/hduser

Scp-r Hadoop [email protected]:/home/hduser

Scp-r Hadoop [email protected]:/home/hduser

Scp-r Hadoop [email protected]:/home/hduser

Start Hadoop

Because the test is now not directly using Start-all, the boot order is (both executed on the master node)
A Log in to the master node using HDUser, enter the Hadoop/sbin directory, start the name node
./hadoop-daemon.sh Start Namenode
b Start the Data node (just start the datanode for each data node)
./start-dfs.sh
C Log on Job.hadoop start Jobtracker (the Tasktracker of each node is also started)
./start-mapred.sh
Shutdown order (performed under Master node Sbin)
Close Datanode and Namenode
./stop-dfs.sh
Login to the job host (Close Tasktracker and Jobtracker)
./stop-mapred.sh

The issue of insufficient permissions may be reported at startup, and it is necessary to assign all the files under Hadoop/sbin on each host.

Verifying the startup effect

To see if the content running on each host is working properly, using JPS, the current planning results should be as follows:

Node name

Run content

Note

Master Node

Namenode

Data node

Datanode,tasktracker

Job node

Jobtracker

Second name node

Secondaryname

Special Note: You can also view the log file under hadoop/logs/to view the startup

Install MySQL (using root)

1. Switch to MySQL extract directory

Cd/home/hduser/tools/mysql

2../configure--prefix=/usr/local/mysql--sysconfdir=/etc--localstatedir=/data/mysql

Note Modify Localstatedir to the location where you want to place the database file

3.make

4.make Install

5.make Clean

6.groupadd MySQL

7.useradd-g MySQL MySQL (the first MySQL is a group, the second MySQL is the user)

8.cd/usr/local/mysql

9.cp/usr/local/mysql/share/mysql/my-medium.cnf/etc/my.cnf

10. #bin/mysql_install_db--user=mysql #建立基本数据库, it must be specified as a MySQL user, only this step can appear in the Var directory under Usr/local/mysql

# Bin/mysqld_safe--user=mysql &

Bin/mysqladmin-u Root Password Oracle

12. Start and close MySQL (can skip)

Start MySQL

#bin/mysqld_safe & or/usr/local/mysql/share/mysql/mysql.server start MySQL

Stop MySQL Method 1

#/usr/local/mysql/share/mysql/mysql.server stop Stop MySQL

Close MySQL Method 2

#ps-aux|grep MySQL View process

#kill ID Number----This is to kill the MySQL process, the ID number is seen in the view of the MySQL process.

13. Registering MySQL as a service

Cp/usr/local/mysql/share/mysql/mysql.server/etc/init.d/mysqld

Add MySQL to the system service with the root user.

#/sbin/chkconfig--add mysqld add MySQL as service

#/sbin/chkconfig--del mysqld Delete mysql service

/sbin/service mysqld Restart #重新启动服务查看是否生效

/sbin/chkconfig--list mysqld #查看是否345运行级别都打开mysql

14. Create the appropriate MySQL account for hive and give sufficient permissions

Enter Root/usr/local/mysql/bin under execution:./mysql-u root-p;

Creating a Hive Database: Create databases hive;

Create user hive, which can only connect to the database from localhost and connect to the WordPress database: Grant all on hive.* to [email protected] identified by ' Oracle '.

Installing Hive (HDUser)

1. Modify the Hadoop_home in/conf/hive-env.sh.template in the hive directory to be the actual Hadoop installation directory:/home/hduser/hadoop

2. Create folder TMP and warehouse under Hive

Mkdir/home/hduser/hive/tmp

Mkdir/home/hduser/hive/warehouse

3. Create/tmp and/user/hive/warehouse in HDFs and set permissions:

Hadoop fs-mkdir/home/hduser/hive/tmp

Hadoop Fs-mkdir/home/hduser/hive/warehouse

Hadoop Fs-chmod g+w/home/hduser/hive/tmp

Hadoop Fs-chmod G+w/home/hduser/hive/warehouse

4. Copy the configuration files from the hive_conf to the/home/hduser/hive/conf

5. Modify the Hive-site.xml file

Modify the property Hive.metastore.warehouse.dir value to:/home/hduser/hive/warehouse

Modify the property Hive.exec.scratchdir value to:/home/hduser/hive/tmp

6. Copy the MySQL JDBC driver package Mysql-connector-java-5.0.7-bin.jar to the Lib directory of hive.

7. Start hive and execute show tables;

8. If the seventh step is no exception, the successful more than half, build several tables, load some data test

Hive>

CREATE TABLE Test_src

(

ACCOUNT1 String,

URL string)

Row format delimited terminated by ' \| ';

Touch A.txt

123412342|http://www.sohu.com

454534534|http://qww.cocm.ccc

Hive>

Load data local inpath '/data/myfile/a.txt ' into table test_src;

Currently using the most basic installation, not set any parameters, production needs to be configured with the relevant parameters

Sqoop installation (HDUser user)

1. Unzip the sqoop1.4.tar.gz

2. Rename to Sqoop

3. Modify the Sqoop file Bin/configure-sqoop, comment out all about hbase and zookeeper

4. Copy the Ojdbc6.jar and Hadoop-core-1.0.0.jar to the Sqoop/lib

5. Add Environment variables

Export Sqoop_home=xxxx

Export path= $SQOOP _home/bin: $PATH

Hadoop+hive+mysql Installation Documentation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.