First we use the new API method to connect MySQL load data to create DF
ImportOrg.apache.spark.sql.DataFrameImportOrg.apache.spark. {sparkcontext, sparkconf}ImportOrg.apache.spark.sql. {savemode, DataFrame}ImportScala.collection.mutable.ArrayBufferImportOrg.apache.spark.sql.hive.HiveContextImportJava.sql.DriverManagerImportjava.sql.Connection Val SqlContext=NewHivecontext (SC) Val mysqlurl= "Jdbc:mysql://10.180.211.100:3306/appcocdb?user=appcoc&password=asia123"
Val ci_mda_sys_table = Sqlcontext.jdbc (Mysqlurl, "ci_mda_sys_table"). Cache ()
Val ci_mda_sys_table_column = Sqlcontext.jdbc (Mysqlurl, "Ci_mda_sys_table_column"). Cache ()
Val ci_label_ext_info = Sqlcontext.jdbc (Mysqlurl, "Ci_label_ext_info"). Cache ()
Val ci_label_info = Sqlcontext.jdbc (Mysqlurl, "Ci_label_info"). Cache ()
Val ci_approve_status = Sqlcontext.jdbc (Mysqlurl, "Ci_approve_status"). Cache ()
Val dim_coc_label_count_rules = Sqlcontext.jdbc (Mysqlurl, "Dim_coc_label_count_rules"). Cache ()
Associating based on multiple table IDs
Val labels = ci_mda_sys_table.join (ci_mda_sys_table_column,ci_mda_sys_table ("table_id") = = = Ci_mda_sys_table_column ("table_id"), "inner"). Cache () Labels.join (Ci_label_ext_info,ci_mda_sys_table_column ("column_id") = = = Ci_ Label_ext_info ("column_id"), "inner"). Cache () Labels.join (Ci_label_info,ci_label_ext_info ("label_id") = = = Ci_label_info ("label_id"), "inner"). Cache () Labels.join (Ci_approve_status,ci_label_info ("label_id") = = = Ci_approve_status ("resource_id"), "inner"). Cache () Labels.filter (Ci_approve_status ("Curr_approve_status _id ") = = = 107 and (Ci_label_info (" data_status_id ") = = = 1 | | Ci_label_info ("data_status_id") = = = 2) and (Ci_label_ext_info ("Count_rules_code") isnotnull) and Ci_mda_sys_table (" Update_cycle ") = = = 1). Cache ()
So the crackling error, the third join when the ID is not found, the problem is very strange ... :
Helpless. Then use the official website API spark1.4 The specified method to try
Val labels = ci_mda_sys_table.join (ci_mda_sys_table_column, "table_id") labels.join (Ci_label_ext_info," column_id ") labels.join (Ci_label_info," label_id ") Labels.join (ci_approve_status). WHERE ($"label_id" ===$ "resource_id")
So the crackling, still can't find the ID ....
Finally helpless. Just use the original method to create a soft connection, load the data, found can: I don't get it ...
Val Ci_mda_sys_table_ddl = s"""CREATE Temporary TABLE ci_mda_sys_table USING org.apache.spark.sql.jdbc OPTIONS ( URL' ${mysqlurl} ', DBTable' Ci_mda_sys_table ' )"" ". StripmarginSqlcontext.sql (CI_MDA_SYS_TABLE_DDL) Val ci_mda_sys_table= SQL ("SELECT * from Ci_mda_sys_table"). Cache ()//val ci_mda_sys_table = Sqlcontext.jdbc (Mysqlurl, "ci_mda_sys_table"). Cache ()Val Ci_mda_sys_table_column_ddl= S"""CREATE Temporary TABLE ci_mda_sys_table_column USING org.apache.spark.sql.jdbc optio NS (URL' ${mysqlurl} ', DBTable' Ci_mda_sys_table_column ' )"" ". StripmarginSqlcontext.sql (CI_MDA_SYS_TABLE_COLUMN_DDL) Val ci_mda_sys_table_column= SQL ("SELECT * from Ci_mda_sys_table_column"). Cache ()//val ci_mda_sys_table_column = Sqlcontext.jdbc (Mysqlurl, "Ci_mda_sys_table_column"). Cache ().........
The final problem is solved. But why not just load it. Still needs to be studied.
With a solution to the problem if you report such a mistake,
15/11/19 10:57:12 INFO blockmanagerinfo:removed broadcast_3_piece0 on cbg6aocdp9:49897 in memory (size:8.4 KB, free:106 0.3MB)15/11/19 10:57:12 INFO blockmanagerinfo:removed broadcast_3_piece0 on cbg6aocdp5:45978 in memory (size:8.4 KB, free:106 0.3MB)15/11/19 10:57:12 INFO blockmanagerinfo:removed broadcast_2_piece0 on 10.176.238.11:38968 in memory (size:8.2 KB, free: 4.7GB)15/11/19 10:57:12 INFO blockmanagerinfo:removed broadcast_2_piece0 on cbg6aocdp4:55199 in memory (size:8.2 KB, free:106 0.3MB)15/11/19 10:57:12 Info contextcleaner:cleaned Shuffle 015/11/19 10:57:12 INFO blockmanagerinfo:removed Broadcast_1_piec E0 on 10.176.238.11:38968 in memory (size:6.5 KB, free:4.7GB)15/11/19 10:57:12 INFO blockmanagerinfo:removed broadcast_1_piece0 on cbg6aocdp8:55706 in memory (size:6.5 KB, free:106 0.3MB) Target_table_code:========================it03exception in Thread"Main"Java.lang.RuntimeException:Error in configuring object at Org.apache.hadoop.util.ReflectionUtils.setJobConf (R Eflectionutils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf (Reflectionutils.java:75) at Org.apache.hadoop.util.ReflectionUtils.newInstance (Reflectionutils.java:133) at Org.apache.spark.rdd.HadoopRDD.getInputFormat (Hadooprdd.scala:190) at Org.apache.spark.rdd.HadoopRDD.getPartitions (Hadooprdd.scala:203) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.rdd.MapPartitionsRDD.getPartitions (Mappartitionsrdd.scala:32) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:219) at org.apache.spark.rdd.rdd$ $anonfun $partitions$2.apply (rdd.scala:217) at Scala. Option.getorelse (Option.scala:120) at Org.apache.spark.rdd.RDD.partitions (Rdd.scala:217) at Org.apache.spark.sql.execution.SparkPlan.executeTake (Sparkplan.scala:121) at Org.apache.spark.sql.execution.Limit.executeCollect (Basicoperators.scala:125) at Org.apache.spark.sql.DataFrame.collect (Dataframe.scala:1269) at Org.apache.spark.sql.DataFrame.head (Dataframe.scala:1203) at Org.apache.spark.sql.DataFrame.take (Dataframe.scala:1262) at org.apache.spark.sql.DataFrame.showString (Dataframe.scala:176) at Org.apache.spark.sql.DataFrame.show (Dataframe.scala:331) at main.asiainfo.coc.impl.indexmakerobj$ $anonfun $makeindexsandlabels$1.apply (indexmakerobj.scala:218) at main.asiainfo.coc.impl.indexmakerobj$ $anonfun $makeindexsandlabels$1.apply (indexmakerobj.scala:137) at scala.collection.indexedseqoptimized$class. foreach (indexedseqoptimized.scala:33) at Scala.collection.mutable.arrayops$ofref.foreach (Arrayops.scala:108) at Main.asiainfo.coc.impl.indexmakerobj$.makeindexsandlabels (Indexmakerobj.scala:137) at Main.asiainfo.coc.cocdss$.main (Cocdss.scala:23) at Main.asiainfo.coc.CocDss.main (Cocdss.scala) at Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at Sun.reflect.NativeMethodAccessorImpl.invoke (Nativemethodaccessorimpl.java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke (Delegatingmethodaccessorimpl.java:43) at Java.lang.reflect.Method.invoke (Method.java:30S) at org.apache.spark.deploy.sparksubmit$.org$apache$spark$deploy$sparksubmit$ $runMain (Sparksubmit.scala: 665) at org.apache.spark.deploy.sparksubmit$.dorunmain$1 (sparksubmit.scala:170) at Org.apache.spark.deploy.sparksubmit$.submit (Sparksubmit.scala:193) at Org.apache.spark.deploy.sparksubmit$.main (Sparksubmit.scala:112) at Org.apache.spark.deploy.SparkSubmit.main (Sparksubmit.scala) caused By:java.lang.reflect.InvocationTargetE Xception at Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at Sun.reflect.NativeMethodAccessorI Mpl.invoke (Nativemethodaccessorimpl.java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke (Delegatingmethodaccessorimpl.java:43) at Java.lang.reflect.Method.invoke (Method.java:30S) at org.apache.hadoop.util.ReflectionUtils.setJobConf (Reflectionutils.java:106) ... 71morecaused By:java.lang.IllegalArgumentException:Compression codec Com.hadoop.compression.lzo.LzoCodec not Found. At Org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses (Compressioncodecfactory.java:135) at Org.apache.hadoop.io.compress.CompressionCodecFactory.<init> (compressioncodecfactory.java:175) at Org.apache.hadoop.mapred.TextInputFormat.configure (Textinputformat.java:45) ... 76morecaused By:java.lang.ClassNotFoundException:Class Com.hadoop.compression.lzo.LzoCodec not found at Org.ap Ache.hadoop.conf.Configuration.getClassByName (Configuration.java:2018) at Org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses (Compressioncodecfactory.java: 128) ... More
At the end of the day, we know it's Hadoop data compression format for Lzo Spark to read a jar package that must be introduced into Hadoop Lzo
spark1.4 loading MySQL data create dataframe and join operation connection method issues