One, the code
Package Com.sgcc.hjimport Java.sql.DriverManagerimport Org.apache.spark.rdd.JdbcRDDimport Org.apache.spark. {sparkconf, sparkcontext}/** * Created by the user on 2016/6/17 . */object jdbctest { def main (args:array[string]) { val conf = new sparkconf () val sc = new Sparkcontext (conf) val rdd = new Jdbcrdd ( SC, () = { Class.forName ("Oracle.jdbc.driver.OracleDriver"). newinstance () drivermanager.getconnection ("Jdbc:oracle:thin:@172.16.222.112:1521:pms", "Scyw", "Scyw") }, " SELECT * from Mw_app. Cmst_airpressure WHERE 1 =? And RowNum <?, 1, 1, r = (r.getstring (1), r.getstring (2), r.getstring (5))) Rdd.collect (). foreach (println) sc.stop ()} }
Second, the operation
Command: Spark-submit--master yarn --jars/opt/test/data/oracle.jdbc_10.2.0.jar --name oracleread--class Com.sgcc.hj.jdbctest--executor-memory 1g/opt/test/data/sparktest.jar (Note that this relies on Oracle's jar package to be added)
Iii. Questions and Answers
1. Official Document Address:
Https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.JdbcRDD
2, the construction parameters in Jdbcrdd:
The first three do not explain, a glance can understand, the next three numbers, the first two means that SQL parameters, must be a long type, and must have, this is the spark source code requirements, if there is no long type of condition, you can use 1=1 this parameter (the third parameter is 1) The third parameter represents a partitioned query, for example, given the first two parameters of 1 and 20, the third parameter is 2, then SQL executes two times, the first parameter is (1, 10), the second is (11, 20), and the last parameter is a function, which means that the 1th, 2, 5 fields in a record are composed of triples. Of course, it can also become another form.
Spark connects Oracle Database (Scala) via Jdbcrdd