Today I'll talk about how to write the results of the spark calculations into MySQL or other relational databases. In fact the way is also very simple, the code is as follows:
Package Scala import java.sql. {DriverManager, PreparedStatement, Connection}import Org.apache.spark. {sparkcontext, sparkconf}ObjectRddtomysql { Case classBlog (name:string, Count:int) def myfun (iterator:iterator[(String, Int)]): Unit= { varConn:connection =NULL varPs:preparedstatement =NULLVal SQL="INSERT into Blog (name, count) values (?,?)" Try{conn= Drivermanager.getconnection ("Jdbc:mysql://localhost:3306/spark","Root","123456") iterator.foreach(data ={PS=conn.preparestatement (SQL) ps.setstring (1, Data._1) ps.setint (2, Data._2) ps.executeupdate ()})}Catch { CaseE:exception = println ("Mysql Exception") } finally { if(PS! =NULL) {ps.close ()}if(Conn! =NULL{conn.close ()}}} def main (args:array[string]) {val conf=NewSparkconf (). Setappname ("Rddtomysql"). Setmaster ("Local") Val SC=Newsparkcontext (CONF) val data= Sc.parallelize (List ("www",Ten), ("Iteblog", -), ("com", -)) ) data.foreachpartition (Myfun)}}
It is actually traversing each of the RDD's partitions via foreachpartition and calling the normal Scala method to write the database. Before running the program you need to make sure that the blog table exists in the database, which can be created using the following statement:
CREATE TABLE ' blog ' ( ' name ' varchar (255) not NULL, int(ten ) unsigned default NULL) ENGINE=innodb default charset=utf-8
Then run the above code directly. When you're done, you can query the results in the database:
SELECT * from blog b;www teniteblog com
It is important to note that:
1. You'd better use the Foreachpartition function to traverse the RDD and create a connection of the database on each work.
2. If your database is limited in concurrency, you can reduce concurrency by controlling the partitioning of the data.
3. When inserting MySQL, it is best to use BULK INSERT.
4. Make sure that you write to the database process to handle the failure, because the process of inserting the database may go through the network, which may cause data to be inserted into the database failure.
5. It is not recommended to write your RDD data to a relational database such as MySQL.
Spark writes the result of the calculation to MySQL