RDBMS data is timed to be captured to HDFs

Source: Internet
Author: User
Tags mysql import sqoop

[TOC]

RDBMS data timing acquisition to HDFS preface

In fact, it is not difficult to use sqoop timing from the MySQL import into HDFs, mainly the use of SQOOP commands and the operation of the Linux script knowledge.

Scene
在我们的场景中,需要每天将数据库中新增的用户数据采集到HDFS中,数据库中有time字段,用以标识该用户信息录入数据库的时间,所以下面的数据导入操作也是依赖于这个字段。
Data preparation

Prepare the following data in the MySQL database:

# #构建sql的操作create DATABASE IF not EXISTS db_log_statics;use db_log_statics; CREATE TABLE ' t_user_info ' (' ID ' bigint () not NULL, ' name ' varchar () COLLATE utf8_bin DEFAULT NULL, ' address ' varc Har (COLLATE) utf8_bin default null, ' time ' date default null, PRIMARY KEY (' id ') engine=innodb default Charset=utf8 Collate=utf8_bin;insert into ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 1 ', ' Zhang San ', ' Beijing Chaoyang ', ' 2018-04-05 '); Insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 2 ', ' John Doe ', ' Henan Luoyang ', ' 2018-04-05 '); insert INTO ' t_user_ Info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 3 ', ' Harry ', ' Guangdong Shaoyang ', ' 2018-04-05 '); insert INTO ' t_user_info ' (' id ', ' name ', ' Address ', ' time ') VALUES (' 4 ', ' Zhao Liu ', ' Shandong Dezhou ', ' 2018-04-07 '); insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 5 ', ' Zhou Qi ', ' Shandong Qingdao ', ' 2018-04-07 '); insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 6 ', ' Wei VIII ', ' Fujian Xiamen ', ' 2018-04-07 '); insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 7 ', ' King two Leper ', ' Wutai, ShanxiMountain ', ' 2018-04-06 '); insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 8 ', ' King Sledgehammer ', ' Shaanxi Yulin ', ' 2018-04-06 '); Insert INTO ' t_user_info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 9 ', ' Yan Xiao Liu ', ' Yunnan Dali ', ' 2018-04-06 '); insert INTO ' t_user_ Info ' (' id ', ' name ', ' address ', ' time ') VALUES (' 10 ', ' rebels ', ' Hubei Xiantao ', ' 2018-04-06 ');
Scripting exportuser2hdfs.sh
#!/bin/env bash# shell中引用外部文件的变量source /home/uplooking/shells/db-mysql.conf# source 系统的环境变量source ~/.bash_profile# 日期变量today=`date +%Y-%m-%d`yesterday=`date -d"1 day ago" +%Y-%m-%d`/home/uplooking/app/sqoop/bin/sqoop import --connect jdbc:mysql://${stat_ipaddr}:${stat_port}/${stat_dbname} --username ${stat_uname} --password ${stat_upwd} --target-dir hdfs://ns1/input/t_user/${yesterday} --query "SELECT id, name, address, time FROM t_user_info WHERE time >=‘${yesterday}‘ AND time < ‘${today}‘ AND \$CONDITIONS" -m 1 --fields-terminated-by "," --split-by ","
Db-mysql.conf
#统计库数据库ip地址stat_ipaddr=192.168.43.116#统计库端口stat_port=3306#统计库名称stat_dbname=db_log_statics#统计库用户名stat_uname=root#统计库密码stat_upwd=root
Writing timed Tasks
crontab -e# 要求每天凌晨2点10分同步数据10 2 * * * /bin/bash /home/uplooking/shells/exportUser2HDFS.sh >/dev/null 2>&1 &

It is important to note that if you remotely edit the shell script file in the notepad++, it is impossible to execute in Linux, because, at this time in Linux with Vim to view the file format: Set FF, it will be found as: Fileformat=dos, As normal, the files we edit in Linux should be: Fileformat=unix, so the solution is: set Ff=unix

RDBMS data is timed to be captured to HDFs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.