Importing files from HDFs into MongoDB via spark SQL

Source: Internet
Author: User

Function: Import files in HDFs into Mongdo via spark SQL

The required jar packages are: Mongo-spark-connector_2.11-2.1.2.jar, Mongo-java-driver-3.8.0.jar

The Scala code is as follows:

ImportOrg.apache.spark.sql.Row
ImportOrg.apache.spark.sql.Dataset
ImportOrg.apache.spark.SparkContext
ImportOrg.apache.spark.sql.SQLContext
ImportOrg.apache.hadoop.conf.Configuration
ImportOrg.apache.spark.sql.SparkSession
ImportCom.mongodb.spark._
ImportOrg.bson.Document
ImportCom.mongodb.spark.config._

ObjectExec {
defMain(args:array[String]) {

if(Args.length <6) {
System.Err. println ("Usage:exec )
System.Exit(1)
}
ValHdfsserver = args (0)//"Hdfs://master"
ValLogPath = args (1)//"/user/hdfs/log/"
ValFileName = args (2)//2017-05-04.txt
ValMongohost = args (3)//"10.15.22.22:23000"
ValMongoDB = args (4)//"MONGO DB"
ValMongocollection = args (5)//"MONGO collection"

Try{
ImportOrg.apache.spark.sql.SparkSession
ValSpark = Sparksession
.Builder()
. Master ("Local")
. AppName ("Sparkimportdatatomongo")
. config ("Spark.debug.maxToStringFields", -). Getorcreate ()
ImportSpark.implicits._
ValDF = Spark.read.json (hdfsserver + LogPath +"/" + fileName)
Df.printschema ()
Df.write.mode ("Append"). Format ("Com.mongodb.spark.sql.DefaultSource"). Option ("Spark.mongodb.output.uri", "mongodb://"+ Mongohost +"/" + MongoDB +"." + mongocollection). Save ()


}Catch{
CaseExException= = {
printf(Ex.tostring ())
}
}
}
}

Execute the following command in the Spark run directory:

./bin/spark-submit--master spark://11.12.13.14:7077--class Exec//bigdata/spark-2.1.1-bin-hadoop2.6/examples/ Importdatatomongo.jar hdfs://master/user/hdfs/log/2017-05-04.txt 10.15.22.22:27017 mydb data_default_test


Run:

[[email protected] spark-2.1.1-bin-hadoop2.6]#  ./bin/spark-submit  --master spark:// 11.12.13.14:7077--class Exec//bigdata/spark-2.1.1-bin-hadoop2.6/examples/importdatatomongo.jar hdfs://master/ User/hdfs/log/2017-05-04.txt 10.15.22.22:27017 mydb data_default_test18/07/20 23:41:13 INFO Spark. Sparkcontext:running Spark version 2.1.118/07/20 23:41:14 INFO Spark. Securitymanager:changing View ACLs to:root18/07/20 23:41:14 INFO Spark. Securitymanager:changing Modify ACLs to:root18/07/20 23:41:14 INFO Spark. Securitymanager:changing View ACLs groups to: 18/07/20 23:41:14 INFO Spark. Securitymanager:changing Modify ACLs groups to: 18/07/20 23:41:14 INFO Spark. SecurityManager:SecurityManager:authentication Disabled; UI ACLs Disabled; Users  with view Permissions:set (root); Groups with view Permissions:set (); Users  with modify Permissions:set (root); Groups with Modify Permissions:set () 18/07/20 23:41:14 INFO util. Utils:successfully started ServiCe ' sparkdriver ' on port 24073.18/07/20 23:41:14 INFO Spark. sparkenv:registering mapoutputtracker18/07/20 23:41:14 INFO Spark. sparkenv:registering blockmanagermaster18/07/20 23:41:14 INFO storage. Blockmanagermasterendpoint:using org.apache.spark.storage.DefaultTopologyMapper for getting topology information18/ 07/20 23:41:14 INFO Storage. Blockmanagermasterendpoint:blockmanagermasterendpoint up18/07/20 23:41:14 INFO storage. diskblockmanager:created Local directory at/tmp/blockmgr-9c42a710-559b-4c97-b92a-58208a77afeb18/07/20 23:41:14 INFO Memory. Memorystore:memorystore started with capacity 366.3 mb18/07/20 23:41:14 INFO Spark. Sparkenv:registering outputcommitcoordinator18/07/20 23:41:14 INFO util.log:Logging initialized @1777ms18/07/20 23:41:14 INFO Server. SERVER:JETTY-9.2.Z-SNAPSHOT18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/jobs,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/jobs/json,nulL,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/jobs/job,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/jobs/job/json,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/stages,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/stages/json,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/stages/stage,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/stages/stage/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler. contexthandler:started [Email protected]{/stages/pool,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/stages/pool/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler. contexthandler:started [Email protEcted]{/storage,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/storage/json,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/storage/rdd,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/storage/rdd/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler. contexthandler:started [Email protected]{/environment,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/environment/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler. contexthandler:started [Email protected]{/executors,null,available, @Spark}18/07/20 23:41:14 INFO handler. contexthandler:started [Email protected]{/executors/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler . contexthandler:started [Email protected]{/executors/threaddump,null,available, @Spark}18/07/20 23:41:14 INFO Handler. Contexthandler:started [Email protected]{/executors/threaddump/json,null,available, @Spark}18/07/20 23:41:14 INFO Handler. contexthandler:started [Email protected]{/static,null,available, @Spark}

Importing files from HDFs into MongoDB via spark SQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.