MongoDB Synchronizing data to Hive (ii)

Source: Internet
Author: User
Tags mongodb hdfs dfs

Mongodb Synchronizing data to Hive (ii)

1. Overview

The previous article mainly introduced the mongodb-based, through the direct connection MongoDB way data mapping to carry on the data query, but that way will have the influence on the online database, so today introduces the second way-bson-based, Even if you export the required tables to a local file with Mongoexport, the file is Bson by default. Then put the exported Bson file into the HDFs file system and finally create the corresponding table in hive to query using hive SQL.

2. Export File

Use the Mongoexport command to export the required collection or fields. The common commands are as follows:

1 ) #mongoexport-uhuoqiu-phuoqiuapp-h 127.0.0.1:27017-d saturn-c mycol-o/root/data/mycol-' Date +%f_%h-%m-%s '. json< /c2>

! -U: Specifies that users must have read access to the database for the backup user.
! -P: Specify user password
! -H: Specify the database server IP and port, for example: Ip:port
! -D: Specify the database name
! -C: Specifies the name of the collection to be backed up
! -O: Specifies the output file,
! --type: Specifies the output type, which is the default JSON format.

2 ) to back up a Collection one of the fields inside

For example, export the ID field inside the mycol and export it as a csv file

#mongoexport-uhuoqiu-phuoqiuapp-h 127.0.0.1:27017-d saturn-c mycol--type csv-f "id"-o/root/data/mycol-' date +%F _%h-%m-%s '. csv

! -D database name

! -C Collection Name

! -O Output filename

! --type output format, default to JSON

! -F Output field, if--type is CSV, you need to add-f "field name"

! -Q Filters A condition: For example:-Q ' {' function ': ' test100 '} '

#mongoexport-h127.0.0.1:27017-uhuoqiu-phuoqiuapp-d saturn-c mycol--csv-f id,function-q ' {"function": "test100"} ' -o/root/data/oplog.rs-' Date +%f_%h-%m-%s '. csv

3) If MongoDB is deployed separately from Hadoop and hive, then a mongodb will need to be deployed on the Hadoop server, and this service is not running, just to copy data using the Mongoexport command.

3. import a file into HDFS

1) First, you need to create a corresponding directory in HDFs to store the corresponding table file.

2) Note that each table needs to create a directory for the corresponding

3) The command is as follows (I have added the bin of Hadoop to the environment variable):

#hdfs Dfs-mkdir/myjob

#hdfs DFS-MKDIR/MYJOB/JOB1

!! Note that the directory for HDFs must be created one level at a time and cannot be created at once.

#将文件传入到HDFS

#hdfs DFS-PUT/DATA/JOB1/MYJOB/JOB1

@/data/job1 for local path, that is, the path of the exported MongoDB file

@/myjob/job1 is the path to HDFs

4) View files that have been uploaded to HDFs

#hdfs DFS-LS/MYJOB/JOB1

5) Modify Permissions

#hdfs Dfs-chmod 777/myjob/job1

6) Get the files inside HDFs

#hdfs DFS–GET/MYJOB/JOB1/DATA/JOB1

7) Delete Files

#hdfs DFS-RM/MYJOB/JOB1

Delete Directory

#hdfs Dfs-rm-r/myjob

Myjob directory needs to be empty, if you want to force the deletion of non-empty directories, you need to add-F.

4. Hive Create a table inside

#hive

Hive>create table if not exists ${table_name}

(

Id String,

Userid String,

.

.

.

Comment ' description '

Row format Serd ' Com.mongodb.hadoop.hive.BSONSerDe '
With Serdeproperties (' mongo.columns.mapping ' = ' {' {') ' the mapping of the Hive field to the MONGO field} ')
stored as InputFormat ' Com.mongodb.hadoop.mapred.BSONFileInputFormat '
OutputFormat ' Com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat '
Directory of location ' HDFs '
#location指示的是bson文件所在的 the HDFS directory, which is/myjob/job1.
5, for the convenience of use, will export MongoDB to local, and import files into the HDFs inside. made a script.
#cat hdfs.sh
#!/bin/bash
#此脚本用于将mongodb里面的collection到处为BSON文件 and upload the file to HDFs.
#定义要导出的表名
List= "
Merchants
Martproducts
Products
Coupons
Couponlogs
Reviews
Orderoplogs
Orders
"
#判断文件是否存在, the presence is deleted
For I in $list
Do
If [-e/data/mongodata/$i];then
rm-rf/data/mongodata/$i
Sleep 5s
Fi
Done
#从mongodb导出数据到本地
For a in $list
Do
Nohup/data/mongodb/bin/mongoexport-utest-ptestpwd-h 192.168.1.11:27017-d saturn-c $a-o/data/mongodata/$a >> /data/nohup.out 2>&1 &
#sleep 5m
Done
#将HDFS里面的文件删除
For B in $list
Nohup/data/hadoop-2.7.3/bin/hdfs dfs-rm/$b/$b >>/data/nohuprm.out 2>&1 &
Done
#将本地的文件导入到HDFS里面
For C in $list
Do
/data/hadoop-2.7.3/bin/hdfs dfs-put/data/mongodata/$c/$c
Sleep 1m
Done
5, add the script to the scheduled task, there are two ways: one is to use crontab, one is to use Jenkins.
1) Use Crontab
#crontab-E
0 * * * */data/hdfs.sh 2>&1 &
2) Using Jenkins
1, create a project, the name of their own definition,

2. Create a running cycle

3) execution

MongoDB Synchronizing data to Hive (ii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.