Recently, hadoop and hive have been successfully configured on five Linux servers. A hadoop cluster requires a machine as the master node, and the rest of the machines are Server Load balancer nodes (Master nodes can also be configured as Server Load balancer nodes ). You only need to configure and use hive on the master node.
1Configure hadoop
Hadoop configuration is relatively simple, because hadoop does not need to be installed. The following describes the installation and configuration steps in detail. Take hadoop 1.9.2 as an example.
(1)download the hadoop-0.19.2.tar.gz file from hadoop and decompress the generated hadoop-0.19.2 directory to the/search/hadoop directory (if you build it in other directories, pay attention to the following to modify the configuration items ).
Enter the command to establish a soft connection $ ln-s hadoop-1.9.2 hadoop (the advantage of doing so is that if you use another version of hadoop instead of reconfiguration)
(2) Both hadoop and hive require machine names. Run the hostname command to modify the machine name of the Local Machine. For example, to change the machine name of 10.10.60.139 to hadoop, enter $ hostname hadoop1.
Modify the/etc/hosts file and add the machine names in all hadoop clusters. The master node and all slave nodes must be added; otherwise, problems may occur. For example, add
10.10.60.139 hadoop1 10.10.60.140 hadoop2 10.10.63.32 hadoop3 10.10.65.48 hadoop4 10.10.67.36 hadoop5 |
(3) because the master machine requires SSH to log on to all slave nodes, the following configurations are required for all machines. In this article, hadoop1 is the master node.
Open the/etc/ssh/sshd_config file to ensure that SSH is not using the SSH2 protocol. Otherwise, modify all protocol 2 to Protocol 1. If the file is modified, run the service sshd restart command to restart the SSH service.
Type the following command
$ Cd ~ /. Ssh/
$ Ssh-keygen-T rsa1-c "hadoop_1"-F/root/. Ssh/identity
$ Cat identity. Pub> authorized_keys
The local public key file is saved to the authorized_keys file. In this case, the SSH localhost command can automatically log on to the local machine without entering the password (if the local IP address of the SSH machine is successful but the SSH localhost fails, you can open/etc/hosts. allow file and add 127.0.0.1 ).
To ensure that the master node does not need a password to directly SSH to the slave node, all slave machines need to rsync down the master machine/root /. SSH/identity. pub file, and cat to the local/root /. SSH/authorized_keys file. At this time, you can log on to the slave machine without entering the password on the master node using SSH + IP.
(4) Modify Environment Variables
Open the/etc/profile file and add
Export $ hadoop_home =/search/hadoop
Export Path = $ hadoop_home/bin: $ path
Close and run the $ source/etc/profile command, the environment variable takes effect.
(5) modify the two configurations of the $ hadoop_home/CONF/hadoop-env.sh configuration file
Export java_home = configure the local JDK or JRE path (if there is no JDK, You Can Yum install java-1.6.0-openjdk-devel, note that it must be java1.6 or later, otherwise hadoop cannot run properly)
# Hadoop occupies 2 GB of memory, which can be changed based on machine configuration.
Export hadoop_heapsize = 2000
(6) modify the $ hadoop_home/CONF/hadoop-site.xml file as follows. Note that the fs. Default. Name and mapred. Job. Tracker configuration items must use the machine name of the master node rather than the IP address. Otherwise, an error occurs when running hive.
<? XML version = "1.0"?> <Br/> <? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> </P> <p> <configuration> <br/> <property> <br/> <Name> FS. default. name </Name> <br/> <value> HDFS: // hadoop1: 9000 </value> <br/> <description> the name of the default file system. either the literal string "local" or a host: port for DFS. </description> <br/> </property> <br/> <Name> mapred. job. tracker </Name> <br/> <value> hadoop1: 9001 </value> <br/> <description> the host and port that mapreduce job tracker runs. if "local ", then jobs are run in-process as a single map and release C <br/> </property> <br/> <Name> hadoop. TMP. dir </Name> <br/> <value>/search/hadoop/tmp </value> <br/> <description> a base for other temporary directories. </description> <br/> </property> <br/> <Name> DFS. name. dir </Name> <br/> <value>/search/hadoop/filesystem/name </value> <br/> <description> determines where on the local filesystem DFS Name node shocould store the name table. if this is a comma-delimited list of <br/> </property> <br/> <Name> DFS. data. dir </Name> <br/> <value>/search/hadoop/filesystem/Data </value> <br/> <description> determines where on the local filesystem DFS data node shoshould store its blocks. if this is a comma-delimited list of dire <br/> </property> <br/> <Name> DFS. replication </Name> <br/> <value> 1 </value> <br/> <description> default block replication. the actual number of replications can be specified when the file is created. the default is used <br/> </property> <br/> </configuration> <br/>
(7) Modify $ hadoop_home/CONF/masters of all machines and specify the IP address of the master node;
Modify $ hadoop_home/CONF/slaves machines of all machines, and specify the IP address of the slave node in the cluster in each line;
Slaves file example:
10.10.60.139 10.10.60.140 10.10.63.32 10.10.65.48 10.10.67.36 |
(8) run the $ hadoop namenode-format command on all machines to format namenode.
(9) run $ hadoop_home/bin/start-all.sh on the master node to start hadoop. Note that you do not need to enter this command on the server Load balancer instance.
Enter the JPS command in shell to view the started hadoop process. For example
11304 datanode JPS 15763 11190 namenode 11516 jobtracker 11636 tasktracker 11437 secondarynamenode |
Note that the master node must include the namenode, secondarynamenode, and jobtracker processes. The slave machine must include the datanode and tasktracker processes to ensure successful startup.
To stop, run $ hadoop_home/bin/stop-all.sh
HadoopQuery interface
Http: // master machine IP Address: 50070/dfshealth. jsp
Http: // master machine IP Address: 50030/jobtracker. jsp
HadoopCommon commands
Hadoop DFS-ls is to view the content in the/usr/root directory. By default, if no path is specified, this is the current user path;
Hadoop DFS-rmr xxx is used to delete directories. It is easy to get started with many commands;
The hadoop dfsadmin-Report command allows you to view the datanode situation globally;
Adding parameters after a hadoop job is an operation on the currently running job, such as list and kill;
Hadoop balancer is the aforementioned command for balancing disk loads.
2 hiveConfiguration
Hive is also a set of software that provides data warehouse SQL functions based on the hadoop distributed computing platform. This simplifies ad-hoc queries by aggregating massive data stored in hadoop. Hive provides a set of Ql query languages, which are SQL-based and easy to use. Hive is already provided in the hadoop directory, but the version is relatively low and is not recommended. Configure hive on the master node as follows:
(1) download hive-0.5.0-bin.tar.gz from the official website and decompress it to generate the hive-0.5.0-bin folder. Create a soft link named hive in the $ hadoop_home/directory.
(2) Add environment variables in/etc/profile
Export hive_home = $ hadoop_home/hive
Export Path = $ hive_home/bin: $ path
(3) run the following command:
$ Hadoop_home/bin/hadoop FS-mkdir/tmp
$ Hadoop_home/bin/hadoop FS-mkdir/user/hive/warehouse
$ Hadoop_home/bin/hadoop FS-chmod g + w/tmp
$ Hadoop_home/bin/hadoop FS-chmod g + w/user/hive/warehouse
(4) Now you can use hive through CLI. However, this method only supports single-user testing. In practical applications, hive metadata (schemal) is often stored in MySQL. In this way, multiple users can be supported.
Therefore, you need to modify the $ hive_home/CONF/hive-default.xml configuration file
Configuration item |
Value |
Javax. JDO. Option. connectionurl |
JDBC: mysql :///? Createdatabaseifnotexist = true |
Javax. JDO. Option. connectiondrivername |
Com. MySQL. JDBC. Driver |
Javax. JDO. Option. connectionusername |
|
Javax. JDO. Option. connectionpassword |
|
(Reference article http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx)
(5) download the mysql-connector-java-5.1.11-bin.jar file from the Internet, and put it in the $ hive_home/lib directory, Then hive has all been configured.