spark提示Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.collection.immutable.Map;

來源:互聯網
上載者:User

標籤:stage   dataframe   for   date   seq   show   解決辦法   一個   generate   

spark提示Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.collection.immutable.Map;起因

編寫了一個處理兩列是否相等的UDF,這兩列的資料結構是一樣的,但是結構比較複雜,如下:

|-- list: array (nullable = true) |    |-- element: map (containsNull = true) |    |    |-- key: string |    |    |-- value: array (valueContainsNull = true) |    |    |    |-- element: struct (containsNull = true) |    |    |    |    |-- Date: integer (nullable = true) |    |    |    |    |-- Name: string (nullable = true) |-- list2: array (nullable = true) |    |-- element: map (containsNull = true) |    |    |-- key: string |    |    |-- value: array (valueContainsNull = true) |    |    |    |-- element: struct (containsNull = true) |    |    |    |    |-- Date: integer (nullable = true) |    |    |    |    |-- Name: string (nullable = true)

也就是說Array裡嵌著Map,Map裡還嵌著一個Array,只能依次去比較,編寫的UDF如下:

case class AppList(Date: Int, versionCode: Int, Name:String)    def isMapEqual(map1: Map[String, Array[AppList]], map2:Map[String, Array[AppList]]): Boolean = {      try{        if (map1.size != map2.size){          return false        } else{          for ( x <- map1.keys){            if (map1(x) != map2(x)){              return false            }          }          return true        }      } catch {        case e: Exception => false      }    }    def isListEqual(list1: Array[Map[String, Array[AppList]]], list2:Seq[Map[String, Seq[AppList]]]): Boolean = {      try {        if (list1.length != list2.length){           return false        } else if (list1.length == 0 || list2.length == 0){          return false        } else {          return isMapEqual(list1(0), list2(0))        }      } catch {        case e: Exception => false      }    }    val isColumnEqual = udf((list1: Array[Map[String, Array[AppList]]], list2:Array[Map[String, Array[AppList]]]) => {      isListEqual(list1, list2)    })

然後我就貼到spark-shell裡去執行了下面語句:

val dat = df.withColumn("equal", isColumnEqual($"list1", $"list2"))dat.show()

此時就出現了如下錯誤:

Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (array<map<string,array<struct<Date:int,Name:string>>>>, array<map<string,array<struct<Date:int,Name:string>>>>) => boolean)  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)  at org.apache.spark.scheduler.Task.run(Task.scala:99)  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.collection.immutable.Map;  at $anonfun$1.apply(<console>:42)  ... 16 more
解決辦法

所謂的解決辦法,自然是去Google了…

在這裡看到,說把Array改成Seq就好了,囧,嘗試了一下,果然就好了

原因

這裡說:

So it looks like the ArrayType on Dataframe "idDF" is really a WrappedArray and not an Array - So the function call to "filterMapKeysWithSet" failed as it expected an Array but got a WrappedArray/ Seq instead (which doesn‘t implicitly convert to Array in Scala 2.8 and above).

意思是,此Array非Scala中的原生Array,而是封裝了一下的Array(有錯的一定指出來,我都沒寫過Scala,慌

參考
  • https://stackoverflow.com/questions/40199507/scala-collection-mutable-wrappedarrayofref-cannot-be-cast-to-integer
  • https://stackoverflow.com/questions/40764957/spark-java-lang-classcastexception-scala-collection-mutable-wrappedarrayofref

spark提示Caused by: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lscala.collection.immutable.Map;

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.