HBase Import SQL Server database data

Source: Internet
Author: User
Tags zookeeper sqoop

The distributed framework of Hadoop, Zookeeper and HBase has been introduced in previous essays, and a distributed cluster with 11 nodes has been built. The use of HBase database is limited to the test of the nature of the deletion and modification instructions, in order to become more familiar with the use of distributed framework, this article introduces the existing data from the relational database SQL Server into HBase in the method.

To complete the migration from the relational database to hbase data, we need to use the Sqoop tool, Sqoop is a standalone project for Apache, designed to deliver data between Hadoop (Hive) and traditional databases (MySQL, PostgreSQL). The Sqoop tool is based on the Data Warehouse tool hive, which transforms data queries into MapReduce tasks to deliver data through hive. Therefore, to complete the migration of this data, we need the following several preparations:

①hive:apache-hive-2.1.1-bin.tar.gz

②sqoop:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

③jdbc for SQL server:sqljdbc_3.0.1301.101_enu.tar.gz

④connector between SQL Server and sqoop:sqoop-sqlserver-1.0.tar.gz

====================== All of the following operations are on the master host and are executed as root user ======================

1. Installing hive

① Creating a hive Directory

Cd/home

mkdir Hive

② Unpacking the installation package (installation package moved to/home/hive)

TAR-ZXVF apache-hive-2.1.1-bin.tar.gz

③ Setting Environment variables

Vi/etc/profile

Append the following:

Export Hive_home=/home/hive/apche-hive-2.1.1-bin

Export path= $HIVE _home/bin: $PATH

Export PATH

Append the following:

Export Hcat_home= $HIVE _home/hcatalog

④ make configuration effective

Source/etc/profile

2, Installation Sqoop

① establishing Sqoop Directory

Cd/home

mkdir Sqoop

② Unpacking the installation package (installation package moved to/home/sqoop)

TAR-ZXVF sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

③ Setting Environment variables

Vi/etc/profile

Append the following:

Export Sqoop_home=/home/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha

Export sqoop_conf_dir= $SQOOP _home/conf

Export path= $SQOOP _home/bin: $PATH

Export PATH

④ make configuration effective

Source/etc/profile

3. Configure JDBC

① Decompression (location optional)

TAR-ZXVF sqljdbc_3.0.1301.101_enu.atr.gz

② replicating JDBC to Sqoop

CP Sqljdbc_3.0/enu/sqljdbc4.jar/home/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib

4. Configuring SQL Server Sqoop Connector

① Decompression (location is optional, this is/home)

TAR-ZXVF sqoop-sqlserver-1.0.tar.gz

② Setting Environment variables

Vi/etc/profile

Append the following:

Export mssql_connector_home=/home/sqoop-sqlserver-1.0/

Configuration takes effect:

Source/etc/profile

③ Configuration to Sqoop

CD sqoop-sqlserver-1.0

./install.sh

5, Configuration Sqoop

The reason for this is that the default configuration of SQOOP will have something we do not need, in its configuration file $sqoop_home/bin/configure-sqoop file, define many parameters and environments that need to be preconfigured, some of which we have configured to complete, But there are other things that are not needed (and are not yet aware of the effect), so in order to prevent the runtime check configuration from passing, our direct approach is to cancel this part of the configuration check.

Note Accumulo related configuration: In the Configure-sqoop file, comment out all the command lines associated with Accumulo_home and save the exit.

6, the current environment variables

It is worth noting that the environment variables that were not added to HBase were previously added to the HBase-related environment variables.

#/etc/profile#System wide environment and startup programs, for login setup#Functions and aliases go IN/ETC/BASHRC#It's not a good idea to the change this file unless you know#is doing. It ' s much better to create a custom.sh shell script in#/etc/profile.d/to make custom changes to your environment, as this#Would prevent the need for merging on future updates.Pathmunge () { case": ${path}:" inch*:" $":*)            ;; *)            if[" $"=" After" ] ; Then PATH=$PATH: $            ElsePATH= $:$PATHfi Esac}if[-x/usr/bin/ID]; Thenif[-Z"$EUID" ]; Then#Ksh WorkaroundEuid= ' ID-u ' UID= ' ID-ru ' fi USER="' Id-un ' "Logname= $USER mail="/var/spool/mail/$USER"fi# Path manipulationif ["$EUID" = "0" ]; Then Pathmunge/sbin pathmunge/usr/sbin pathmunge/usr/local/sbinelse pathmunge/usr/local/sbin after path Munge/usr/sbin after Pathmunge/sbin afterfihostname= '/bin/hostname 2>/dev/null ' histsize=1000if ["$HISTCONTROL" = "Ignorespace" ] ; Then export histcontrol=ignorebothelse export histcontrol=ignoredupsfiexport PATH USER LOGNAME MAIL HOSTNAME histsiz E histcontrol# By default, we want Umask to get set. This sets it for login shell# current threshold for system reserved Uid/gids are 200# you could check uidgid reservation VA Lidity in#/usr/share/doc/setup-*/uidgid Fileif [$UID-gt 199] && ["' Id-gn '" = "' Id-un '" ]; Then umask 002else umask 022fifor i in/etc/profile.d/*.sh; do if [-R"$i" ]; Then if ["${-#*i} "! =" $-"]; Then."$i"        Else            . "$i">/dev/null 2>&1fi fidoneunset Iunset- FPathmungeexport Java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131. X86_64export CLASSPATH=.:$JAVA _home/jre/lib/rt.jar:$JAVA _home/lib/dt.jar:$JAVA _home/lib/Tools.jarexport PATH=$PATH:$JAVA _home/bin#HadoopExport hadoop_home=/home/hadoop/hadoop-2.7.3Export PATH=$PATH:$HADOOP _home/Sbinexport PATH=$PATH:$HADOOP _home/bin#ZookeeperExport zookeeper_home=/home/zookeeper/zookeeper-3.4.6/Export PATH=$ZOOKEEPER _home/bin:$PATHExport PATH#HBaseExport hbase_home=/home/hbase/hbase-1.2.4Export PATH=$HBASE _home/bin:$PATHExport PATH#HiveExport hive_home=/home/hive/apache-hive-2.1.1-Binexport PATH=$HIVE _home/bin:$PATHExport Pathexport hcat_home=$HIVE _home/Hcatalog#SqoopExport sqoop_home=/home/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-Alphaexport Sqoop_conf_dir=$SQOOP _home/Confexport PATH=$SQOOP _home/bin:$PATHExport Pathexport mssql_connector_home=/home/sqoop-sqlserver-1.0/

7. Data Migration Experiment

You need to create a table in HBase before you execute the migration command

#hbase Shell

Execute after entering HBase shell

>create ' test ', ' CF '

And then execute it in terminal.

#sqoop import--connect ' Jdbc:sqlserver://<ip address>;username=<username>;p assword=<password>; Database=<database> '--table <sql server table name>--hbase-table

Need to note:

① If the table of SQL Server is a normal table structure for a single primary key, you can specify hbase-row-key directly to execute the above command, which will be executed by default with multiple MapReduce tasks

② If the table of SQL Server is a federated primary key, then there is a problem with this import, the query task cannot be decomposed based on the primary key, there is no way to do mapreduce, then you must specify the parameter '-M 1 ' with only one mapreduce task

③ for cases where there is no primary key, if the amount of data is large and must be divided into multiple mapreduce tasks, a split field needs to be found so that hive can split the task based on that field. At this point we must add '--split-by <id> ' in the import directive

In my actual operation, the table structure in SQL Server is the Federated primary Key, the first import table of 100 data records, according to this blog introduction, CONFIGURED $sqoop_home/conf/sqoop-site.xml, I specified the '--hbase-row-key <id1>,<id2> ' and specify '-M 1 ' to import the data successfully. (Time consuming 22sec)

  

The second attempt is to import nearly 160W data records from the database into HBase, add the parameter '--split-by <id> ' and specify '-M 12 ', and the data is also imported successfully. (Time 17min25sec)

  

8, there are problems

The understanding of the Sqoop split task is still not deep enough, although the successful completion of the experiment, but did not do comparative experiments to determine the real impact factors

9. Reference Articles

Sqoop User Guide (v1.4.6)

Using Sqoop to import data from a database into HDFs-I feed myself bag salt-blog channel-csdn.net

Sqoop SQL Server data into hbase-nma_123456 's column-Blog channel-csdn.net

Centos uses Sqoop to import data from SQL Server to HDFs or hive-Wang Weixian column-Blog channel-csdn.net

Sqoop Parallel Import data-Dongjie Bookstore-Blog channel-csdn.net

Sqoop simple control tips for importing HBase tables for federated primary keys-gang's technology blog-Blog Channel-csdn.net

  

Reference links

HBase Import SQL Server database data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.