Basic installation and configuration of sqoop under hadoop pseudo Distribution

Source: Internet
Author: User
Tags mysql import sqoop hadoop fs

1. Environment tool Version Introduction

Centos6.4 (final)

Jdk-7u60-linux-i586.gz

Hadoop-1.1.2.tar.gz

Sqoop-1.4.3.bin__hadoop-1.0.0.tar.gz

Mysql-5.6.11.tar.gz

2. Install centos

Refer to the use of online Ultra to create a USB flash drive to start and directly format the installation system. There are a lot of information on the Internet, but it is best not to change the host name during installation, also, it is best not to use the graphical interface to add users, because I have redone the system due to a problem and can complete all the terminal operations.

3. Install JDK

The installed centos has a JDK running environment. You need to uninstall it first because it only includes the JRE environment. It is better to install JDK because it requires debugging and compilation, rpm-Qa | grep JDK: Check the installed JDK version and run the command to uninstall it. Note that the machine must be connected to IOT platform. Configure the IP address and Yum-y remove JDK version name first, uninstall the package and install it in JDK. Here, the package with bin in the file is compiled and installed, you can decompress the package directly here. I use the following directory structure, as shown in the following environment variables. After decompression, put the package in/usr/Java. You need to configure the environment variable, VIM/etc/profile.

Export java_home =/usr/Java/jdk1.7.0 _ 60

Export jre_home =/usr/Java/jdk1.7.0 _ 60

Export classpath =.: $ java_home/lib/dt. jar: $ java_home/lib/tools. jar: $ jre_home/lib

Export Path = $ path: $ java_home/bin: jre_home/bin

Then ESC, save and exit

Source/etc/profile

Java-version

If the Java version is displayed, the JDK configuration is successful.

4. hadoop Installation

Before that, make sure that the SSH protocol is installed on the machine, such as rpm-Qa | grep ssh.

If there is information such as SSH, it indicates that there is SSH in the machine. Otherwise, you need to install the yum command, and Yum-y Install SSH.

Then create a hadoop user (here you can name it at will), groupadd hadoop, useradd-G hadoop (add hadoop to the hadoop group), and then switch to the hadoop user, you can add the sudo permission for hadoop in sudoers. Pay attention to the access permission for the sudoers file. After the permission is changed, you need to change it back to root all = (all) add hadoop all = (all) All under all. I forgot to remind you that it is best to install centos in English, which may bring a lot of convenience, (You will know after trying), Su-hadoop.

Then configure login without a password. Due to the pseudo distribution, it is configured on the computer, sudo service sshd restart, ssh-keygen-t rsa-p''

Enter again

The generated file is saved to/home/hadoop/. Ssh/by default, and then CD/home/hadoop/. SSH

Cat id_rsa.pub> authorized_keys

You can log on successfully without a password!

Next install hadoop, first decompress, and then put in/usr/local/hadoop, then need to configure a series of files, need to configure environment variables, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, the configuration information will be omitted. After configuration, you must grant the hadoop folder permission to the hadoop user, sudo chown-r hadoop: hadoop/usr/local/hadoop /, then source/etc/profile, hadoop namenode-format, For the first formatting, do not create TMP, name, data and other file paths in advance, the system will automatically create, otherwise it will cause node startup exceptions, then start, start-all.sh, JPs command to view the startup points, a total of 5 plus JPs 6

Namenode

Datanode

Jobtracker

Tasktracker

Sencondnamenode

JPS

There have been countless errors in the process. It can be confirmed that if the datanode does not exist, it is likely that the problem caused by formatting multiple times cannot be solved: go to hadoop/data/current/, VIM version, and check whether the namespaceid is consistent with the name. If not, change it to the same one. Then restart the cluster without formatting, try to avoid formatting multiple times. After the cluster is started successfully, you can run its own example.

5. Install MySQL

I will introduce a good blog here. I just installed it. To achieve this, yum needs to install some compilation tools, centosmysqlinstall.

6. Install sqoop

Decompress the package and put it in/usr/local/sqoop. Then, configure the environment variables, check whether the sqoop version is successfully configured, and then perform related necessary configurations, sqoop depends on the hadoop-core.jar in hadoop and MySQL connection jar package, are put in sqoop/lib, and then need to modify the sqoop configuration file, as shown in the attachment: First hadoop needs to start, then perform the MySQL import HDFS operation. sqoop import -- connect MYSQL: // localhost: 3306/databasename -- table tablename -- username -- Password-M 1, by default, it is imported to HDFS. Later you can configure hbase, hive, and so on, and then view the import: hadoop FS-CAT/user/hadoop/test/part-m-00000, such as the import process problems, if the problem is caused by a version issue, the system will prompt a methodnotfoundexecption exception. If it is another exception, you can modify the HDFS configuration file to solve the problem. This is what I solve, and then it is easy to use.

Add two property properties to the hdfs-site.xml. One is security, and the other is permission.

<Name> DFS. Permissions </Name> <value> false </value>

<Name> DFS. safemode. Threshold. PCT </Name> <value> 0 </value>

This article was written only afterwards, so there may be problems. If you have any unclear partners, you can ask me, 374492359

This article is from the "Java notes" blog, please be sure to keep this source http://maidoujava.blog.51cto.com/7607166/1533071

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.