Spark 串連mysql 及MongoDB

來源:互聯網
上載者:User

標籤:

在spark 運算過程中,常常需要串連不同類型的資料庫以擷取或者儲存資料,這裡將提及Spark如何串連mysql和MongoDB.

1. 串連mysql , 在1.3版本提出了一個新概念DataFrame ,因此以下方式擷取到的是DataFrame,但是可通過JavaRDD<Row> rows = jdbcDF.toJavaRDD()轉化為JavaRDD。

import java.io.Serializable;import java.util.HashMap;import java.util.List;import java.util.Map;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.sql.DataFrame;import org.apache.spark.sql.Row;import org.apache.spark.sql.SQLContext;public class Main implements Serializable {    private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(Main.class);    private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver";    private static final String MYSQL_USERNAME = "expertuser";    private static final String MYSQL_PWD = "expertuser123";    private static final String MYSQL_CONNECTION_URL =            "jdbc:mysql://localhost:3306/employees?user=" + MYSQL_USERNAME + "&password=" + MYSQL_PWD;    private static final JavaSparkContext sc =            new JavaSparkContext(new SparkConf().setAppName("SparkJdbcDs").setMaster("local[*]"));    private static final SQLContext sqlContext = new SQLContext(sc);    public static void main(String[] args) {        //Data source options        Map<String, String> options = new HashMap<>();        options.put("driver", MYSQL_DRIVER);        options.put("url", MYSQL_CONNECTION_URL); //getConnection 返回一個已經開啟的結構化資料庫連接,JdbcRDD會自動維護關閉。        options.put("dbtable",                    "(select emp_no, concat_ws(‘ ‘, first_name, last_name) as full_name from employees) as employees_name");//     sql 是查詢語句,此查詢語句必須包含兩處預留位置?來作為分割資料庫ResulSet的參數,例如:"select title, author from books where ? < = id and id <= ?"        options.put("partitionColumn", "emp_no");//進行分區的表欄位        options.put("lowerBound", "10001");//     owerBound, upperBound, numPartitions 分別為第一、第二預留位置,partition的個數。例如,給出lowebound 1,upperbound 20, numpartitions 2,則查詢分別為(1, 10)與(11, 20)        options.put("upperBound", "499999");        options.put("numPartitions", "10");        //Load MySQL query result as DataFrame        DataFrame jdbcDF = sqlContext.load("jdbc", options);        JavaRDD<Row> rows = jdbcDF.toJavaRDD(); 
List<Row> employeeFullNameRows = jdbcDF.collectAsList(); for (Row employeeFullNameRow : employeeFullNameRows) { LOGGER.info(employeeFullNameRow); } }}

2. 串連mongoDB

可參考 https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage

 

Spark 串連mysql 及MongoDB

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.