Code:
ImportCom.mongodb.spark.config.ReadConfigImportcom.mongodb.spark.sql._val config=SqlContext.sparkContext.getConf.set ("Spark.mongodb.keep_alive_ms", "15000"). Set ("Spark.mongodb.input.uri", "mongodb://10.100.12.14:27017"). Set ("Spark.mongodb.input.database", "BI"). Set ("Spark.mongodb.input.collection", "usergroupmapping") Val Readconfig=readconfig (config) Val objusergroupmapping=SqlContext.read.format ("Com.mongodb.spark.sql"). MONGO (Readconfig) Objusergroupmapping.printschema () Val tbusergroupmapping=objusergroupmapping.todf () tbusergroupmapping.registertemptable ("Usergroupmapping") Select _id,c,g,n,rn,t,ut from usergroupmapping where UT> ' 2018-05-02 ' limit 100
Using the above code to take the 100 records after the Usergroupmapping collection, a executor time out and GC overhead limit exceeded exception occurred. At first thought that the task from MongoDB data is too large, resulting in spark executor memory is not enough, and then researched the spark MongoDB connector when fetching data is conditional, That is, the first filter from MongoDB and then retrieve the spark memory, so that there is not enough memory. Later, after online research, there is a statement that the task is too many, leading to the time of the task GC contention for GC Time and memory resources (this is not very clear), according to this statement, I will originally task core from 16 to 6 after the run program, actually won't error. As for the specific reason is not very clear, first recorded here.
Spark Read MongoDB failed, reported executor time out and GC overhead limit exceeded exception