MongoDB Synchronizing data to Hive (ii)

Last Update:2017-09-21 Source: Internet

Author: User

Tags mongodb hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mongodb Synchronizing data to Hive (ii)

1. Overview

The previous article mainly introduced the mongodb-based, through the direct connection MongoDB way data mapping to carry on the data query, but that way will have the influence on the online database, so today introduces the second way-bson-based, Even if you export the required tables to a local file with Mongoexport, the file is Bson by default. Then put the exported Bson file into the HDFs file system and finally create the corresponding table in hive to query using hive SQL.

2. Export File

Use the Mongoexport command to export the required collection or fields. The common commands are as follows:

1 ) #mongoexport-uhuoqiu-phuoqiuapp-h 127.0.0.1:27017-d saturn-c mycol-o/root/data/mycol-' Date +%f_%h-%m-%s '. json< /c2>

！ -U: Specifies that users must have read access to the database for the backup user.

！ -P: Specify user password

！ -H: Specify the database server IP and port, for example: Ip:port

！ -D: Specify the database name

！ -C: Specifies the name of the collection to be backed up

！ -O: Specifies the output file,

！ --type: Specifies the output type, which is the default JSON format.

2 ) to back up a Collection one of the fields inside

For example, export the ID field inside the mycol and export it as a csv file

#mongoexport-uhuoqiu-phuoqiuapp-h 127.0.0.1:27017-d saturn-c mycol--type csv-f "id"-o/root/data/mycol-' date +%F _%h-%m-%s '. csv

! -D database name

! -C Collection Name

! -O Output filename

! --type output format, default to JSON

! -F Output field, if--type is CSV, you need to add-f "field name"

！ -Q Filters A condition: For example:-Q ' {' function ': ' test100 '} '

#mongoexport-h127.0.0.1:27017-uhuoqiu-phuoqiuapp-d saturn-c mycol--csv-f id,function-q ' {"function": "test100"} ' -o/root/data/oplog.rs-' Date +%f_%h-%m-%s '. csv

3) If MongoDB is deployed separately from Hadoop and hive, then a mongodb will need to be deployed on the Hadoop server, and this service is not running, just to copy data using the Mongoexport command.

3. import a file into HDFS

1) First, you need to create a corresponding directory in HDFs to store the corresponding table file.

2) Note that each table needs to create a directory for the corresponding

3) The command is as follows (I have added the bin of Hadoop to the environment variable):

#hdfs Dfs-mkdir/myjob

#hdfs DFS-MKDIR/MYJOB/JOB1

!! Note that the directory for HDFs must be created one level at a time and cannot be created at once.

#将文件传入到HDFS

#hdfs DFS-PUT/DATA/JOB1/MYJOB/JOB1

@/data/job1 for local path, that is, the path of the exported MongoDB file

@/myjob/job1 is the path to HDFs

4) View files that have been uploaded to HDFs

#hdfs DFS-LS/MYJOB/JOB1

5) Modify Permissions

#hdfs Dfs-chmod 777/myjob/job1

6) Get the files inside HDFs

#hdfs DFS–GET/MYJOB/JOB1/DATA/JOB1

7) Delete Files

#hdfs DFS-RM/MYJOB/JOB1

Delete Directory

#hdfs Dfs-rm-r/myjob

Myjob directory needs to be empty, if you want to force the deletion of non-empty directories, you need to add-F.

4. Hive Create a table inside

#hive

Hive>create table if not exists ${table_name}

(

Id String,

Userid String,

Comment ' description '

Row format Serd ' Com.mongodb.hadoop.hive.BSONSerDe '

With Serdeproperties (' mongo.columns.mapping ' = ' {' {') ' the mapping of the Hive field to the MONGO field} ')

stored as InputFormat ' Com.mongodb.hadoop.mapred.BSONFileInputFormat '

OutputFormat ' Com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat '

Directory of location ' HDFs '

#location指示的是bson文件所在的 the HDFS directory, which is/myjob/job1.

5, for the convenience of use, will export MongoDB to local, and import files into the HDFs inside. made a script.

#cat hdfs.sh

#!/bin/bash

#此脚本用于将mongodb里面的collection到处为BSON文件 and upload the file to HDFs.

#定义要导出的表名

List= "

Merchants

Martproducts

Products

Coupons

Couponlogs

Reviews

Orderoplogs

Orders

#判断文件是否存在, the presence is deleted

For I in $list

Do

If [-e/data/mongodata/$i];then

rm-rf/data/mongodata/$i

Sleep 5s

Fi

Done

#从mongodb导出数据到本地

For a in $list

Do

Nohup/data/mongodb/bin/mongoexport-utest-ptestpwd-h 192.168.1.11:27017-d saturn-c $a-o/data/mongodata/$a >> /data/nohup.out 2>&1 &

#sleep 5m

Done

#将HDFS里面的文件删除

For B in $list

Nohup/data/hadoop-2.7.3/bin/hdfs dfs-rm/$b/$b >>/data/nohuprm.out 2>&1 &

Done

#将本地的文件导入到HDFS里面

For C in $list

Do

/data/hadoop-2.7.3/bin/hdfs dfs-put/data/mongodata/$c/$c

Sleep 1m

Done

5, add the script to the scheduled task, there are two ways: one is to use crontab, one is to use Jenkins.

1) Use Crontab

#crontab-E

0 * * * */data/hdfs.sh 2>&1 &

2) Using Jenkins

1, create a project, the name of their own definition,

2. Create a running cycle

3) execution

MongoDB Synchronizing data to Hive (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More