Spark writes the result of the calculation to MySQL

Source: Internet
Author: User
Tags bulk insert

Today I'll talk about how to write the results of the spark calculations into MySQL or other relational databases. In fact the way is also very simple, the code is as follows:

Package Scala import java.sql. {DriverManager, PreparedStatement, Connection}import Org.apache.spark. {sparkcontext, sparkconf}ObjectRddtomysql { Case classBlog (name:string, Count:int) def myfun (iterator:iterator[(String, Int)]): Unit= {    varConn:connection =NULL    varPs:preparedstatement =NULLVal SQL="INSERT into Blog (name, count) values (?,?)"    Try{conn= Drivermanager.getconnection ("Jdbc:mysql://localhost:3306/spark","Root","123456") iterator.foreach(data ={PS=conn.preparestatement (SQL) ps.setstring (1, Data._1) ps.setint (2, Data._2) ps.executeupdate ()})}Catch {       CaseE:exception = println ("Mysql Exception")    } finally {      if(PS! =NULL) {ps.close ()}if(Conn! =NULL{conn.close ()}}} def main (args:array[string]) {val conf=NewSparkconf (). Setappname ("Rddtomysql"). Setmaster ("Local") Val SC=Newsparkcontext (CONF) val data= Sc.parallelize (List ("www",Ten), ("Iteblog", -), ("com", -)) ) data.foreachpartition (Myfun)}}

It is actually traversing each of the RDD's partitions via foreachpartition and calling the normal Scala method to write the database. Before running the program you need to make sure that the blog table exists in the database, which can be created using the following statement:

CREATE TABLE ' blog ' (  ' name ' varchar (255) not NULL,  int(ten ) unsigned default NULL) ENGINE=innodb default charset=utf-8

Then run the above code directly. When you're done, you can query the results in the database:

SELECT * from   blog b;www  teniteblog  com 

It is important to note that:
1. You'd better use the Foreachpartition function to traverse the RDD and create a connection of the database on each work.
2. If your database is limited in concurrency, you can reduce concurrency by controlling the partitioning of the data.
3. When inserting MySQL, it is best to use BULK INSERT.
4. Make sure that you write to the database process to handle the failure, because the process of inserting the database may go through the network, which may cause data to be inserted into the database failure.
5. It is not recommended to write your RDD data to a relational database such as MySQL.

Spark writes the result of the calculation to MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.